Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re2c: Disable UTF-8 #540

Merged
merged 1 commit into from
Apr 2, 2024
Merged

re2c: Disable UTF-8 #540

merged 1 commit into from
Apr 2, 2024

Conversation

nwellnhof
Copy link
Contributor

The regexes don't require UTF-8 features and work in ASCII mode as well. Disabling UTF-8 reduces the size of the code generated by re2c by a couple of KBs.

I regenerated the regex code with re2c 3.0 because that's what I have on Ubuntu 22.04 and I had to add a (void) marker line to suppress an unused variable warning. Feel free to regenerate with your version of re2c.

The regexes don't require UTF-8 features and work in ASCII mode as
well. Disabling UTF-8 reduces the size of the code generated by re2c by
a couple of KBs.
@jgm jgm merged commit d8de8c7 into commonmark:master Apr 2, 2024
15 checks passed
@nwellnhof
Copy link
Contributor Author

There's still quite of bit a bloat in the re2c generated code but that's hard to fix. The main issue is that re2c seems to handle {m} style quantifiers by creating m copies of the subregex. This approach is taken by regex engines like RE2 (unrelated to re2c) as well but isn't well-suited to ahead-of-time compilation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants