Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add continuous language option #10

Merged
merged 6 commits into from
Mar 26, 2024

Conversation

sota-horiuchi
Copy link

@sota-horiuchi sota-horiuchi commented Mar 26, 2024

What

Add continuous language option

  • Added a module to determine whether a language can ignore whitespace.
  • If it is a language that can be ignored (i.e. a continuation language), whitespace is made maybe

Why

In particular, languages like Japanese, where words are not separated by spaces, cannot be parsed, so this has been fixed.

  • For instance 砂糖300g <- 砂糖=sugger

How

Add ContinuousLanguageLocale module to judge that current locale use continuous language.

- avoid parsing sugger 100g -> sugge, 100, g when continuous language
@@ -87,7 +92,7 @@ class RootParser < Parslet::Parser

rule(:reverse_format) do
# e.g. flour 200g
((whitespace >> quantity).absent? >> any).repeat.as(:ingredient) >>
((whitespace >> quantity >> any.absent?).absent? >> any).repeat.as(:ingredient) >>
Copy link
Author

@sota-horiuchi sota-horiuchi Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take for example sugger 300g: since sugger contains g in the quantity, it transitions to the next rule after reading to sug in continuous language.
Basically, nothing is considered to come after quantity, so add any.absent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this change, but i'm happy that if the tests are passing then it's probably fine 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In continuation languages, where whitespace is treated as optional (denoted as "maybe"), parsing issues arise when encountering words that include units as substrings (for example, the letter "g" in "sugger"). This complication becomes apparent when tested locally.

The rule was designed to interpret strings structured as name+quantity. By ensuring no additional characters follow the quantity, we can eliminate the misinterpretation of substrings such as "g" in "suger" being wrongly recognized as a unit of quantity.

I hope I've explained it well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh great, I see! Thanks for explaining

Copy link
Member

@liamnichols liamnichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice improvement! I left a few suggestions 🙇

@@ -0,0 +1,9 @@
module Ingreedy
module ContinuousLanguageLocale
CONTINUOUS_LANGUAGES_LOCALES = [:ja].freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can apply this improvement to Chinese and Thai languages as well? I'm not sure if there are others or not, but these both stand out to me as countries that we support at Cookpad today with a similar rule.

Copy link
Author

@sota-horiuchi sota-horiuchi Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can.
For Japan, the JP development team had already reached an agreement, but if it seems appropriate, we can also add China and Thailand.

lib/ingreedy/continuous_language_locale.rb Outdated Show resolved Hide resolved
lib/ingreedy/amount_parser.rb Outdated Show resolved Hide resolved
lib/ingreedy/dictionary_collection.rb Outdated Show resolved Hide resolved
@sota-horiuchi
Copy link
Author

sota-horiuchi commented Mar 26, 2024

hmm 🤔

While loading ./spec/ingreedy_spec.rb a `raise SyntaxError` occurred, RSpec will now quit.
Failure/Error: require File.join(path, "dictionary_collection")

SyntaxError:
  /home/runner/work/ingreedy/ingreedy/lib/ingreedy/dictionary_collection.rb:11: syntax error, unexpected ','
  ...cale] = Dictionary.new(locale:, **attributes)
  ...                              ^
  /home/runner/work/ingreedy/ingreedy/lib/ingreedy/dictionary_collection.rb:41: syntax error, unexpected ','
  ...le] ||= Dictionary.new(locale:, **load_yaml(locale))
  ...                              ^
  /home/runner/work/ingreedy/ingreedy/lib/ingreedy/dictionary_collection.rb:41: syntax error, unexpected ')', expecting `end'
  ...w(locale:, **load_yaml(locale))
  ...                              ^
  /home/runner/work/ingreedy/ingreedy/lib/ingreedy/dictionary_collection.rb:52: syntax error, unexpected end-of-input, expecting `end'

@liamnichols
Copy link
Member

liamnichols commented Mar 26, 2024

Omitting the hash value seems to be a Ruby 3.1 feature, sorry

https://blog.saeloun.com/2021/09/28/ruby-allow-value-omission-in-hash-literals/

It should be Dictionary.new(locale: locale, **load_yaml(locale))

I think that we also need to consider the order though, because locale should take priority over any contents of the dictionary:

Dictionary.new(**load_yaml(locale), locale: locale)

Copy link
Member

@liamnichols liamnichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Please also consider bumping version.rb to 0.1.1 🙇 Actually maybe there is no need to do this

@@ -87,7 +92,7 @@ class RootParser < Parslet::Parser

rule(:reverse_format) do
# e.g. flour 200g
((whitespace >> quantity).absent? >> any).repeat.as(:ingredient) >>
((whitespace >> quantity >> any.absent?).absent? >> any).repeat.as(:ingredient) >>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this change, but i'm happy that if the tests are passing then it's probably fine 👍

@sota-horiuchi
Copy link
Author

Please also consider bumping version.rb to 0.1.1 🙇 Actually maybe there is no need to do this

hmm. I think so too.

@sota-horiuchi sota-horiuchi merged commit e7326c9 into master Mar 26, 2024
6 checks passed
@sota-horiuchi sota-horiuchi deleted the horiso/add-continuous-language-option branch March 26, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants