Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the zero index from VarZeroVec #5601

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Sep 25, 2024

Fixes #5516

Also removes METADATA_WIDTH, we've set it to 0 and aren't using it. It's an unnecessary untested complication on our unsafe code.

@Manishearth Manishearth force-pushed the remove-first-index-vzv branch 4 times, most recently from f06ff0e to 43aec09 Compare September 27, 2024 00:04
@Manishearth Manishearth marked this pull request as ready for review September 27, 2024 00:05
@Manishearth
Copy link
Member Author

I still need to run all the datagens and fix non-zerovec tests, but this works within zv.

@sffc
Copy link
Member

sffc commented Sep 27, 2024

Also removes METADATA_WIDTH, we've set it to 0 and aren't using it. It's an unnecessary untested complication on our unsafe code.

It was there to enable us to use FlexZeroVec as the index store. But, making VZV cheaper overall means that we can lean toward designs with more, shorter VZVs with Index8, instead of fewer, longer ones with Index16. We also now have ZeroTrie which covers some of these types of cases. I'm okay removing METADATA_WIDTH so long as we keep the door open to potentially re-adding a more compact index structure in the future.

@Manishearth
Copy link
Member Author

It was there to enable us to use FlexZeroVec as the index store.

Yeah, that would require far more changes to the unsafe code of VZV than just METADATA_WIDTH, METADATA_WIDTH didn't particularly set the stage for anythig there code-wise.

It would be interesting encapsulating more and more of the format into the trait to allow for fancier custom formats.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To components.rs

utils/zerovec/src/varzerovec/components.rs Outdated Show resolved Hide resolved
// Safety: mid is < end <= len, so in-range
let cmp = predicate(self.get_unchecked(mid));

match cmp {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why did you expand the binary_search impl instead of using the std one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hacky impl was written assuming a slice to search on. Now I could still use a byte slice that I have cut out appropriately, but the indices slice that we currently use is no longer long enough for this to work, and it was going to get more complicated, so I decided to dispense with the weird hack. I can bring it back and make it work with the weird hack if desired.

Manishearth added a commit that referenced this pull request Sep 30, 2024
Fixes #5603

v3 isn't yet necessary but will become necessary with
#5601. This can be landed
independently of #5601 but
these PRs will bitrot each other. Slight preference to landing this
first.

(At the moment, V2 == V3, but #5601 breaks the format)

This makes V1 and V2 data still *detectable* by ICU4X, but they will
throw a helpful error when attempting to load them.

From a user's POV, there is now only the "blob" format (which covers
both V003 and V003Bigger, picked at datagen time based on data size.


<!--
Thank you for your pull request to ICU4X!

Reminder: try to use [Conventional
Comments](https://conventionalcomments.org/) to make comments clearer.

Please see
https://github.com/unicode-org/icu4x/blob/main/CONTRIBUTING.md for
general
information on contributing to ICU4X.
-->
@Manishearth Manishearth force-pushed the remove-first-index-vzv branch 3 times, most recently from f00a1d8 to 74c0048 Compare September 30, 2024 18:46
@sffc sffc self-requested a review September 30, 2024 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove the initial 0 length from VarZeroVec's representation
2 participants