Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

location name romanization #9

Open
russbiggs opened this issue May 4, 2023 · 0 comments
Open

location name romanization #9

russbiggs opened this issue May 4, 2023 · 0 comments

Comments

@russbiggs
Copy link
Member

It would be nice to add a romanization version of location names for locations that contain non-roman characters (CJK, Arabic, Hebrew, Cyrillic etc.). The goal is not to translate names, since many of these are proper nouns and are not suited for translation.

e.g. 아산시청 -> asan-si cheong

The functionality would be additive only and would not replace the original name, so some consideration about the the romanized name is stored in the DB is also needed.

In the ingestion process I proposed this will be a two step process:

  1. Identify if a name has non-roman characters
  2. romanize the characters

For the first step, as long as the names are coming in as unicode it seems like we can scan for matches across the different language character ranges. e.g. https://stackoverflow.com/a/50434862 and then identify the general character set.

For the romanization it seems like using individual libraries for each language/character group may be the best approach. I can't find a one-size-fits all library. One issue we will need to consider is in the case of some character sets where the characters have different sounds per language we may need to identify the language, beyond just character set used. e.g. Persian uses arabic characters but may have different romanized outputs, or any of the many language that use Cyrillic (Russian, Ukranian, Bulgarian, Mongolian.)

Any thoughts? @caparker @majesticio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant