Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLRE is broken (incorrect regex matching) #23

Closed
gemarcano opened this issue Dec 17, 2014 · 7 comments
Closed

SLRE is broken (incorrect regex matching) #23

gemarcano opened this issue Dec 17, 2014 · 7 comments
Labels
Milestone

Comments

@gemarcano
Copy link
Contributor

See cesanta/slre#19 for details. Should we look for a replacement, or should we try to debug this? Their code is not straight forward, and I have little experience with regex engines. Thoughts?

@ajl2612
Copy link
Contributor

ajl2612 commented Dec 17, 2014

Can we do the regex parsing in the ruby parser since ruby has much better regex handling? If the user decided to add new pages or alter the path name of something on the server it will require a restart anyway.

@gemarcano
Copy link
Contributor Author

The problem is that regex is being used to parse out the URL at runtime, and the ruby code is only run at "set-up" or compile time.

For what we're using regexes, we could simply do a dumb string parse of the URI instead. A bit annoying, but it could work...

A third alternative is to use the library in C++, but that requires a fully C++11 compatible compiler (even GCC 4.9.0 has bugs in its regex library; newest gcc is gcc 4.9.2). We are using some parts of C++11 in our code, so maybe this worry is moot and we should just use what should be a part of the standard library?

@gemarcano
Copy link
Contributor Author

I'm beginning to be irked by SLRE. I found a possible bug where the final match it finds on a regex, if it is followed by the terminating null character, it includes it in the final length of the string (that's... not right according to C string conventions). This is messing up our current code even more. I'll work around it, but at this rate, we need a replacement or alternative. They haven't even commented on the original issue I placed in their Issues page.

@jaredsmithse jaredsmithse added this to the Winter Break milestone Dec 17, 2014
@jaredsmithse
Copy link
Contributor

@ajl2612 - As per @gemarcano's comment, we can't do this regex matching in ruby because it handles URLs. If not I would love to use ruby since it is oh-so-simple (=~ /regex/ is awesome and useable in switch statements!).

@gemarcano - lets look at alternatives for now and discuss after break, right now it works for our basic needs and we can leave it be until we are all together again and have had time to look at other options. If you find any specific bugs, be sure to open them as issues on their repo so that they can be resolved. Link us to the discussions so we can listen in and see how they unfold so you don't need to have constant updates for us.

@gemarcano
Copy link
Contributor Author

Funny thing, the current setup does not do all of our basic needs. Currently, if we do a GET request on the server on a group, the regex matches it as though it were a full group/name combo (incorrectly), and then promptly segfaults because my code tries to access a name that is not there. That having been said, GET and PUT requests do work on the full group/name combos (I still need to implement PUT properly, but GET is doing something with regards to REST).

If we're OK submitting R1 with this deficiency, I have no problem putting this off until when we come back.

@jkugs
Copy link
Contributor

jkugs commented Jan 12, 2015

I managed to work around this problem by changing our regex strings slightly to:

static const char * full_regex = "^/group/([a-z0-9-_]+)/name/([a-z0-9-_]+)$";
static const char * group_regex = "^/group/([a-z0-9-_]+)$";

This will of course force urls to only contains lowercase letters, numbers, dashes, and underscores but I believe these would be sufficient. We can add to it what we wanna include. I think the problem stems from:

[^...] Match any character but ones from set

Not sure why that screws it up but matching characters to a set normally seems to work correctly.

@gemarcano
Copy link
Contributor Author

We'll use Jim's suggestion for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants