Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split icu_locale into two crates #3931

Open
sffc opened this issue Aug 24, 2023 · 35 comments
Open

Split icu_locale into two crates #3931

sffc opened this issue Aug 24, 2023 · 35 comments
Labels
C-locale Component: Locale identifiers, BCP47

Comments

@sffc
Copy link
Member

sffc commented Aug 24, 2023

There is too much going on in icu_locid_transform for it to be a core crate. It should only contain the things essential for fallback.

@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Aug 24, 2023
@sffc sffc modified the milestones: 1.3 Blocking ⟨P1⟩, ICU4X 2.0 Aug 24, 2023
@sffc sffc added the 2.0-breaking Changes that are breaking API changes label Aug 24, 2023
@sffc
Copy link
Member Author

sffc commented Aug 24, 2023

We need more discussion time for this issue. There's not time for 1.3 to have that discussion. It's not a bad enough problem to block 1.3. So this should block 2.0.

@sffc
Copy link
Member Author

sffc commented Aug 24, 2023

Discuss with:

@sffc sffc added the discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band label Aug 24, 2023
@sffc
Copy link
Member Author

sffc commented Nov 9, 2023

Decision from 2023-11-09:

  • icu_locid stays as is
  • icu_locid_transform becomes the lower level crate
  • Revive icu_canonicalizer to become the higher level crate

See full notes in the 2023-Q3 Summit notes doc.

@sffc sffc removed discuss Discuss at a future ICU4X-SC meeting discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band labels Nov 9, 2023
@sffc
Copy link
Member Author

sffc commented Nov 30, 2023

The consensus from 2023-11-09 was to use the name icu_canonicalizer, but the revivable crate is named icu_locale_canonicalizer: https://docs.rs/icu_locale_canonicalizer/latest/icu_locale_canonicalizer/

I remember that I did not really like this name from the start because it uses "locale" instead of "locid" as the rest of the crates use. I also don't really like making people write icu::locale_canonicalizer::LocaleDirectionality.

So I would like to make a new proposal for the names of these crates:

  1. Low-level crate: icu_fallback
  2. Higher-level crate: icu_locid_adapters

Note that we already have icu_provider_adapters so this builds on the same type of naming scheme.

The contents of the crates will be as discussed on 2023-11-09. I am not proposing any changes except for the names of the crates. The consensus there was:

  • locid_transform crate split: make it icu_locid_transform (for LocaleFallbacker, and future data-driven functionality relating to locales that all components need as a low-level dependency) and icu_canonicalizer (for LocaleCanonicalizer, LocaleExpander, and LocaleDirectionality, and future functionality that is not a universal component dependency)

OK?

Wants approval from:

@sffc sffc added the needs-approval One or more stakeholders need to approve proposal label Nov 30, 2023
@robertbastian
Copy link
Member

icu_provider_adapters contains adapters that are providers and wrap providers. This is absolutely not the case in the high-level locale crate, it contains logic that works with locales, so I think icu_locid_adapters is a very bad name. We might actually need it in the future to provide adapters between icu_locid::Locale and other locale types in the Rust ecosystem.

We explicitly picked one of the names to be icu_locid_transform in order to not have another abandoned crate name. I still stand behind the original decision, even if icu_canonicalizer won't be a revive of the abandoned icu_locale_canonicalizer crate but a new name.

@sffc
Copy link
Member Author

sffc commented Dec 1, 2023

My agreement to the previous consensus was based on the understanding that icu_canonicalizer was the name of the previous crate. Given that it is actually icu_locale_canonicalizer, the consensus is not well-defined and therefore I withdraw my consensus vote for that naming scheme.

New proposal:

  • icu_fallback for the low-level crate
  • icu_locid_transform for the higher-level crate

Wants approval from:

@robertbastian
Copy link
Member

I'd rather use a shiny new name for the higher-level crate, but won't die on this hill.

@sffc
Copy link
Member Author

sffc commented Dec 8, 2023

I would love to hear a suggestion for a better name for the higher-level crate. Here are the only ones I've seen:

  • icu_locid_utils
  • icu_canonicalizer

@robertbastian
Copy link
Member

  • icu_locid_info
  • icu_locid_meta[data]

A whole new can of worms would be suggesting to rename icu_locid to icu_locale in 2.0. The crate's main types are Locale and LanguageIdentifier, there's no LocaleIdentifier that would shorten to locid. Then icu_locale_canonicalizer would be a good fit.

@sffc
Copy link
Member Author

sffc commented Dec 28, 2023

It's named icu_locid because in ICU4C there is locid.h, and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.

I could live with icu_locid_info although I don't know if that's better than icu_locid_transform

@robertbastian
Copy link
Member

I don't think ICU4C file names should have any bearing on our naming.

Note that the library inside the icu-locale crate could still be called icu_locale. At some point crates.io might also figure out that it can resolve icu_locale to icu-locale.

@robertbastian
Copy link
Member

robertbastian commented Feb 8, 2024

and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.

I don't think this is a big problem. Most users will use the module through the meta crate. For the ones that don't, cargo add icu_locale will work. The library inside the icu-locale crate can will be called icu_locale anyway.

@robertbastian
Copy link
Member

Another suggestion: Most users of a locale type crate will expect there to be canonicalization, minimization, and other locale information. There's a very small use case (basically icu_provider) that only needs a locale type, with nothing else. So:

  • icu_locale_core: current icu_locid. icu_provider depends on this. Not a module in icu
  • icu_locale: reexports types from icu_locale_core, and adds everything that's currently in icu_locid_transform. Is a module in icu.

@sffc
Copy link
Member Author

sffc commented Feb 9, 2024

Hmm. What do you think about icu_core? Everything can depend on it and it can include Locale and anything else we need to truly share across all components such as logging and documentation macros (#4467).

@sffc
Copy link
Member Author

sffc commented Feb 9, 2024

So the crates could be:

  • icu_core = contains Locale, LanguageIdentifier, and internal macros for things such as logging and documentation. We will try to keep it as small as possible.
  • icu_locid OR icu_locale = re-exports Locale/LanguageIdentifier and includes LocaleExpander, LocaleCanonicalizer, LocaleDirectionality, etc.
  • icu_fallback OR icu_locid_transform = fallback machinery.

Dependency matrix:

Crate Dependencies
icu_core utils
icu_provider utils, icu_core
icu_fallback utils, icu_core, icu_provider
icu_locid, no default features utils, icu_core, icu_provider
icu_locid, compiled_data utils, icu_core, icu_provider, icu_fallback
icu_decimal, no default features utils, icu_core, icu_provider
icu_decimal, compiled_data utils, icu_core, icu_provider, icu_fallback

Does that look about right?

@robertbastian
Copy link
Member

I guess icu_core is fine, although there might be a non-ICU use case for icu_locale_core that doesn't require ICU's logging macros. All ICU4X crates (except for locid) already depend on icu_provider, so that's kind of our core crate.

@Manishearth Manishearth moved this to Tiny (can be done near the end) in icu4x 2.0 Feb 23, 2024
@Manishearth Manishearth moved this from Tiny (can be done near the end) to Needs discussion to unblock in icu4x 2.0 Feb 23, 2024
@robertbastian
Copy link
Member

  • @sffc - We have stuff that we might want to share across icu4x beyond just locales. So maybe we can have icu_core.
  • @robertbastian - Locale, LanguageIdentifier are well designed and shouldn't be lumped in with the rest of the utils.
  • @Manishearth - What are the other utilities?
  • @sffc:
    1. Logging macros, which currently are in icu_provider::_internal::logging
    2. Size test-and-doc macro, currently in icu_datetime
    3. Internal documentation macro
    4. Other macros to deduplicate docs like the experimental alert
  • @Manishearth - I'm warming up to icu_utils and having it not be on the 1.0 track
  • @robertbastian - 1, 3, and 4 seem like they are coupled to ICU4X
  • @sffc - The log dependency could be generalizable. It is mainly is about preserving messages in debug mode with eprintln!
  • @robertbastian - It seems the logging could be its own crate, useful outside of icu4x.
  • @robertbastian - I'd rather have multiple small crates than one grab bag, it simplifies version and makes vendoring/change management easier
  • @robertbastian - I think for this issue we should discuss whether icu_core should be a locid + utils grab bag, or have icu_locid_core and some other solution for utils
  • @echeran - I think we should avoid icu_core unless it is really core to internationalization (universally used across all crates). icu_locale_core sounds good
  • @sffc - All the things in my list are macros. So they don't even need to be in a crate. They can be included via the filesystem. include!?
  • @Manishearth - Maybe but include! is bad for publishing. We can copy the files around.

Proposed conclusion: icu_<something with locale>_core with Locale and friends, and icu_<something with locale> with data and funcionality. Do not necessarily need the same infix, see

LGTM: @robertbastian @Manishearth @sffc @echeran

Possible names:

  1. icu_locale_core + icu_locid + icu::locid
  2. icu_locale_core + icu_locid + icu::locale
  3. icu_locale_core + icu-locale + icu::locale
  4. icu_locale_core + icu_locale + icu::locale (if possible)
  5. icu_locid_core + icu_locid + icu::locid

Voting:

Note: 4 > 5 >> everything else

Decision: if possible we do icu_locale provided crates.io lets us rename. If not, we add icu icu_locid_core and repurpose icu_locid.

@Manishearth
Copy link
Member

Manishearth commented Mar 14, 2024

Discusion with Zibi present

3 crates:

  • icu_locale_core: structures (currently icu_locid)
  • icu_bikeshed: fallbacking and other locale-related machinery for compiled data
  • icu_locale: re-exports icu_locale_core, icu_bikeshed, and adds data-driven functionality (similar to icu_locid_transform)

Discussion:

  • @zbraniecki - I think the vast majority of users of Locale don't need the data. We shouldn't make them either carry all the data or hunt around for icu_locale_core. My slight preference would be to keep icu_locale with the structs only.
  • @robertbastian - The argument for this structure is that users of Locale probably want access to canonicalization, display names, etc.
  • @zbraniecki - Long term I think most users won't want or need canonicalization.
  • @Manishearth - Having a type and the things to use it with in the same crate seem good from a user point of view. I think we can add a feature flag for some of the more data-using things if people need.

Proposals for icu_bikeshed:

  1. icu_locale_mantle
  2. icu_fallback
  3. icu_locale_fallback
  4. icu_locale_ops
  5. icu_locale_data_utils
  6. icu_locale_support

Discussion:

  • @echeran - I think icu_locale_mantle is too "cute"
  • @zbraniecki - I agree that icu_locale_mantle hinders the ability of users to grok the crate. But it is nice once people figure out what it means.
  • @zbraniecki - icu_locale_fallback seems good because it won't expand beyond fallback. The contract is about retrieving a value that we need from the locale but the locale doesn't contain that value. It also covers my expectation that fallback itself may grow in ways that contain weights, etc.
  • @Manishearth - We don't want people going to this crate for fallback; they should be going to the re-export crate.
  • @robertbastian - If we make a new crate for every feature we split out of icu_locale, it gets very messy.

Open for voting

@sffc
Copy link
Member Author

sffc commented Apr 11, 2024

The first round of voting didn't reach a clear consensus. I'll send out another ballot with these options:

icu_locale_fallback
icu_locale_ops
icu_locale_support
icu_locale_util (added late)

@hsivonen
Copy link
Member

I think we should put display names in a crate separate from the others, because it's unlike the other data: It's for UI and large-ish.

@sffc
Copy link
Member Author

sffc commented Apr 23, 2024

ICU4X WG discussion from 2024-04-18:

  • @leftmostcat - What immediately strikes me is that you could have option 5 (four small crates) with feature-driven re-exports
  • @sffc - The only hard requirement is that there must be a crate with just the core structs because otherwise we have circular dependencies
  • @robertbastian - Features are messy because then you need to worry about forwarding features
  • @leftmostcat - _util doesn't describe what anything does; it means we don't know what it does and couldn't come up with a better name
  • @robertbastian - We talked the locale crate splitting briefly with Zibi but didn't have enough time to wrap it up. For the ill-formed strings, Henri seems to be the expert on those types.

Decision on how to move forward: convene a small group to finalize decision on the controversial topics and get alignment. This small group discussion will produce the final recommendation of the WG.

Locale crate splitting:

Note: The names of the crates resulting from the above discussion will be brought back to the main group.

@sffc
Copy link
Member Author

sffc commented Apr 23, 2024

2024-04-23 Special Meeting

#3931

Four categories of functionality:

  1. Core Structs, including comparison, normalization
  2. Parent Locales, Partial Likely Subtags
  3. Aliases, Full Likely Subtags, Locale Directionality
  4. Display Names

Options

Option 1

One-stop shop:

icu::locale::Locale
icu::locale::LocaleExpander
icu::locale::LocaleCanonicalizer
icu::locale::LocaleDirectionality
icu::locale::LocaleDisplayNames

Requires child crates:

  • icu_locale_core for core structs: required because of circular dependency with icu_provider
  • icu_locale_fallback used for compiled data
  • Optional: icu_locale_transform, icu_locale_names

Note: this creates multiple ways to include the same types:

icu_locale_fallback::LocaleFallbacker
icu_locale::LocaleFallbacker
icu::locale::LocaleFallbacker

Option 2+

Different crates for different functionality

  • icu_locale for core structs
  • icu_locale_fallback, icu_locale_transform, icu_locale_names, ...

Included in metacrate as:

icu::locale::Locale
icu::locale_fallback::LocaleFallbacker
icu::locale_transform::LocaleCanonicalizer
...

Notes

  • @zbraniecki - I'm shifting from thinking of icu_locale as lightweight/small, 90% of users want core structs, to this is a metacrate, if you don't know what you're doing, come here and you can tailor it.
  • @echeran - When I think about artifacts in the Java world... I think the metacrate makes sense here because, among other reasons, it is consistent with the ICU metacrate itself. It gives them the ability to summon any subcomponent of functionality. If we're already doing it at the top level, we should be consistent throughout.

Discussion on icu_locale_core naming:

  • @zbraniecki - Instead of icu_locale_core, should we consider icu_locid? I still think there's a disproportionally high value for a crate that just gives them structs. icu_locale_core sounds to me like an internal crate introduced predominently to resolve the circular dependency. But I think it is a standalone crate, popular relative to the whole of ICU. Every HTTP, locale mangement, language handling crate will want to use this. We give the nicer name to the metacrate, and icu_locid is for the structs.
  • @robertbastian - icu_locid is not a real name; there is nothing called "locid". Maybe icu_locale_types?
  • @zbraniecki - icu_locale_types seems not much better than icu_locale_core
  • @Manishearth - I think we can teach people about _core. It can mean other things, similar to _traits crates. I agree that locid is just confusing. It seems that if we expect this to be a big popular crate, we shouldn't restrict ourselves to the ICU4C name.
  • @sffc - I would be supportive of icu_locid. The transfer of knowledge from ICU4C and the 4 years of precedent in Rust should not be discounted.
  • @echeran - I prefer icu_locale_core because, (1), it's a common convention when having a multi-module artifact to have a "core", including in Java. (2) Visual consistency of icu_locale and icu_locale_something that reflects parent/meta crate to child crate relationship. (3) "locid" is short, and is the ICU4C identifier, but without background knowledge, I just want to see "locale".
  • @Manishearth - I disagree with the concept of knowledge transfer. I understand that from the perspective of a "bidi level", which is actual knowledge being transfered. But there's no reason we shouldn't be just using the word "locale". With locid, we are accessible to ICU4C developers, but not new developers. With locale, we're accessible to developers from any background.
  • @zbraniecki - Thanks for the discussion. I would still weakly prefer icu_locid. I see _core as being similar to _raw, an internal-facing thing. Should we consider icu_locale_id?
  • @echeran - It seems to me that _core implies what we're going for here. The signal for what it means is pretty strongly correlated to what we're trying to do here.
  • @Manishearth - I think _core gets used for many things in the Rust world.
  • @sffc - I remember seeing it in the context of rand in Rust.
  • @zbraniecki - I want to emphasize "light" or "minimal". When I see "core" I think of "raw".
  • @Manishearth - We can document that in the first line of the crate docs. It's good to have that answer up front.

Discussion on old crates:

  • @robertbastian - Should we keep the old crates with re-exports? Maybe deprecated?
  • @sffc - I prefer to have a clean break in 2.0
  • @Manishearth - We have a graveyard and a system in place, let's use it

Discussion on icu_locale_fallback crate name:

  • @echeran - If it's strictly about fallback, fine, but it seems like we do a lot of refactoring and we may want to include other functionality in this crate. If the crate contains more than fallback, we'd need to refactor again.
  • @sffc - "fallback" is the term that, I think, most accurately and precisely describes what's in this crate.
  • @echeran - This seems fine so long as we have good documentation that, e.g., likely subtags is part of fallback.

Proposal:

Introduce 3 crates:

  • icu_locale_core
    • Contains core structs: very similar to current icu_locid
    • Contains clear top-level documentation that this is a non-internal crate that is intended to be used when fewer dependencies are desired
  • icu_locale_fallback
    • Contains parent locales and basic likely subtags: subset of current icu_locid_transform
  • icu_locale
    • Contains re-exports icu_locale_core and icu_locale_fallback, and also includes other locale-related functionality, which could be split to additional child crates if the need arises
  • Names of crates are as stated here.
  • The two old crates will be retired as graveyard crates at their version 2.

LGTM: @echeran @robertbastian @sffc @zbraniecki @Manishearth

@sffc sffc removed the needs-approval One or more stakeholders need to approve proposal label May 9, 2024
@sffc sffc moved this from Needs discussion to unblock to Unclaimed for sprint in icu4x 2.0 May 9, 2024
@sffc
Copy link
Member Author

sffc commented May 9, 2024

This can be done ahead of time by creating crates and re-exports.

@robertbastian robertbastian self-assigned this May 14, 2024
@robertbastian
Copy link
Member

After starting this I don't think it's worth doing for 1.5, verifying that everything stays semver compatible and introducing extra crates is a lot more work than just the few renames and moves that this needs.

@robertbastian
Copy link
Member

Does the conclusion from #5120 apply to splitting out fallback as well?

@robertbastian robertbastian added the discuss-priority Discuss at the next ICU4X meeting label Sep 17, 2024
@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

Does the conclusion from #5120 apply to splitting out fallback as well?

I don't see how the conclusion in #5120 applies to splitting out fallback except that we don't need to proactively add the Cargo feature for it.

@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

icu_locale is still this weird combination of both a component crate and a metacrate. Given that we plan to keep adding things to it, I do still think that icu_locale is likely get to a point where we will want to split out fallbacking in order to avoid compiling bloat in every component.

@robertbastian
Copy link
Member

I do still think that icu_locale is likely get to a point where we will want to split out fallbacking in order to avoid compiling bloat in every component.

Yes, but it's currently not at this point, because nobody is asking for it. Hence this issue can be closed.

@robertbastian robertbastian removed their assignment Sep 19, 2024
@robertbastian robertbastian removed the 2.0-breaking Changes that are breaking API changes label Sep 19, 2024
@sffc
Copy link
Member Author

sffc commented Sep 19, 2024

  • @robertbastian I don't see it as a universal improvement. I don't have time for it. We don't have clients asking for it.
  • @sffc We previously achieved consensus on this. We know that compiling more crates and more compiled data leads to the integration issues such as the rust-url one. This is low-hanging fruit that just makes sense.
  • @robertbastian - Having more crates also has downsides, like more complex vendoring of icu4x. There's not currently that much data in icu_locale
  • @Manishearth It impacts people vendoring all of ICU4X, but maybe it would be easier for people vendoring only part of ICU4X.
  • @robertbastian With the current shape of icu_locale, I don't see this as an improvement. If we add displaynames, we can re-discuss this, at which time we should split out either displaynames or fallback.

Conclusion:

  • icu_locale_fallback split still on the table, but not high priority
  • According to last week's consensus, splitting it is not a breaking change
  • Revisit in the future, such as when displaynames are stabilized or when a client asks for it

LGTM: @sffc @robertbastian @Manishearth

@sffc sffc added C-locale Component: Locale identifiers, BCP47 and removed discuss-priority Discuss at the next ICU4X meeting labels Sep 19, 2024
@robertbastian robertbastian removed the C-locale Component: Locale identifiers, BCP47 label Sep 19, 2024
@sffc sffc removed this from icu4x 2.0 Sep 19, 2024
@sffc sffc added the C-locale Component: Locale identifiers, BCP47 label Sep 19, 2024
@robertbastian robertbastian changed the title Split icu_locid_transform into two crates Split icu_locale into two crates Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-locale Component: Locale identifiers, BCP47
Projects
None yet
Development

No branches or pull requests

4 participants