Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Strict parser #184

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

antstorm
Copy link
Contributor

@antstorm antstorm commented Aug 16, 2024

⚠️ NOT READY TO BE MERGED YET ⚠️

This PR is aimed to be merged on top of #183, hence a few overlapping changes. Once the #183 is merged, I'll rebase this one.

Based on the previous discussion (#164) this PR implements a StrictParser class that aims to only recognise well formatted inputs based on a number of allowed formats that it expects. The idea here is rather simple — when handling money it's better to be conservative with the inputs and discard anything that looks off. This is still rather flexible and with the future improvements (arguments passed via options) it can be configured to only allow a couple of formats.

The implementation is based on the tokenization — we first detect any known parts (currency, sign, amount, etc) and then check there's nothing extra left and the order matches one of the expected formats. If anything the implementation should be more straight forward than the OptimisticParser.

This PR also runs the all the existing specs (unmodified) twice — once with each parser to should where they disagree. This is done on purpose to highlight the differences and open them up for discussion.

There are currently 5 different fail cases (about 39 failed specs, but most test the same exact thing):

  1. Non-exhaustive matches ($5.95 ea., L9.99, kr.123,45, kr9.99, 20.00 OMG, hello 2000 world) — all these contain text that we couldn't figure out (kr is not a listed as a symbol, OMG is not a currency we know), therefore I think we shouldn't parse these
  2. nil input — I think similar to an empty string this should fail, because empty input is not the same as a zero value (which is what OptimisticParser returns)
  3. Thousands separator vs decimal mark (4.635, 6,534, 1.550 USD, 1.009, 1.001) — the current approach is based on the two configuration parameters expect_whole_subunits and enforce_currency_delimiters. The current logic in the StrictParser treats these as thousands unless the currency has a subunit_to_unit ratio > 100. The rationale is to avoid implicit rounding, which comes with it's own set of surprises. This is where I'd love get your input — what's the most intuitive way of treating these?
  4. Weird format (19.12.89) — this doesn't look like a decimal point input and shouldn't be parsed as monetary value
  5. Trailing dot floats (£.45B) — I'm somehow conflicted about this, however this is not ambiguous and probably should be allowed. However when you consider ,12 it becomes less obvious — is it 0.12 or an incorrect input? It might be safer to exclude this and force the unit part to always be present. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant