Implement Strict parser #184

antstorm · 2024-08-16T14:48:57Z

⚠️ NOT READY TO BE MERGED YET ⚠️

This PR is aimed to be merged on top of #183, hence a few overlapping changes. Once the #183 is merged, I'll rebase this one.

Based on the previous discussion (#164) this PR implements a StrictParser class that aims to only recognise well formatted inputs based on a number of allowed formats that it expects. The idea here is rather simple — when handling money it's better to be conservative with the inputs and discard anything that looks off. This is still rather flexible and with the future improvements (arguments passed via options) it can be configured to only allow a couple of formats.

The implementation is based on the tokenization — we first detect any known parts (currency, sign, amount, etc) and then check there's nothing extra left and the order matches one of the expected formats. If anything the implementation should be more straight forward than the OptimisticParser.

This PR also runs the all the existing specs (unmodified) twice — once with each parser to should where they disagree. This is done on purpose to highlight the differences and open them up for discussion.

There are currently 5 different fail cases (about 39 failed specs, but most test the same exact thing):

Non-exhaustive matches ($5.95 ea., L9.99, kr.123,45, kr9.99, 20.00 OMG, hello 2000 world) — all these contain text that we couldn't figure out (kr is not a listed as a symbol, OMG is not a currency we know), therefore I think we shouldn't parse these
nil input — I think similar to an empty string this should fail, because empty input is not the same as a zero value (which is what OptimisticParser returns)
Thousands separator vs decimal mark (4.635, 6,534, 1.550 USD, 1.009, 1.001) — the current approach is based on the two configuration parameters expect_whole_subunits and enforce_currency_delimiters. The current logic in the StrictParser treats these as thousands unless the currency has a subunit_to_unit ratio > 100. The rationale is to avoid implicit rounding, which comes with it's own set of surprises. This is where I'd love get your input — what's the most intuitive way of treating these?
Weird format (19.12.89) — this doesn't look like a decimal point input and shouldn't be parsed as monetary value
Trailing dot floats (£.45B) — I'm somehow conflicted about this, however this is not ambiguous and probably should be allowed. However when you consider ,12 it becomes less obvious — is it 0.12 or an incorrect input? It might be safer to exclude this and force the unit part to always be present. WDYT?

antstorm added 9 commits August 14, 2024 12:51

Rename current default parser to OptimisticParser

8ec7c51

Extract abstract Monetize::Parser class

20c1ce0

Specify Monetize::Parser interface

9d9bff6

Support registering multiple parsers

1b9e607

Add StrictParser

fb6ed96

Temporary ENV variable to run tests with StrictParser

d3cf5ff

Optimize performance of the StrictParser

8e46dc0

Tidy up StrictParser

aaaa0b0

Add class comment to StrictParser

dc619e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Strict parser #184

Implement Strict parser #184

antstorm commented Aug 16, 2024 •

edited

Loading

Implement Strict parser #184

Are you sure you want to change the base?

Implement Strict parser #184

Conversation

antstorm commented Aug 16, 2024 • edited Loading

antstorm commented Aug 16, 2024 •

edited

Loading