Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use approximate entropy for WriterCounter #3106

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shssoichiro
Copy link
Collaborator

This improves performance at a small cost to efficiency

This improves performance at a small cost to efficiency
@shssoichiro
Copy link
Collaborator Author

@shssoichiro
Copy link
Collaborator Author

@maj160 what improvements did you see in your benchmarking? Mine is only showing 1-2% speedup for this change.

@codecov-commenter
Copy link

Codecov Report

Base: 86.80% // Head: 86.83% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (ad293b1) compared to base (8978707).
Patch coverage: 66.66% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3106      +/-   ##
==========================================
+ Coverage   86.80%   86.83%   +0.03%     
==========================================
  Files          83       83              
  Lines       33330    33331       +1     
==========================================
+ Hits        28931    28943      +12     
+ Misses       4399     4388      -11     
Impacted Files Coverage Δ
src/ec.rs 83.12% <66.66%> (-0.13%) ⬇️
src/context/partition_unit.rs 90.27% <0.00%> (-0.28%) ⬇️
src/partition.rs 75.84% <0.00%> (-0.17%) ⬇️
src/encoder.rs 86.91% <0.00%> (-0.04%) ⬇️
src/asm/x86/predict.rs 43.29% <0.00%> (ø)
src/cpu_features/x86.rs 39.58% <0.00%> (ø)
src/context/block_unit.rs 93.60% <0.00%> (+0.06%) ⬆️
src/rdo.rs 85.62% <0.00%> (+0.46%) ⬆️
src/asm/x86/lrf.rs 92.22% <0.00%> (+0.63%) ⬆️
src/context/mod.rs 92.30% <0.00%> (+1.09%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/ec.rs Outdated
* print([x for x in (stats())])
*/
// Units of 1/128 of a bit
const ENTROPY_LOOKUP: [u16; 513] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit like a subtle performance bug that has popped up before.

Suggested change
const ENTROPY_LOOKUP: [u16; 513] = [
const ENTROPY_LOOKUP: &[u16; 513] = &[

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I had no idea that this could lead to performance issues. I'm really curious if there's any Github issue on Rustc talking about this.

Co-authored-by: David Michael Barr <[email protected]>
@maj160
Copy link
Contributor

maj160 commented Jan 19, 2023

Original commit author here:
I'd probably treat this as WIP or hide it behind a feature flag - It currently seems to increase filesize more than I'm comfortable with.

The lookup table values can probably be improved - I think the correct value is log2 of the probability at higher levels, but at lower levels the EC_MIN_PROB factor comes into play and complicates things. The values I've put in this change are the best compromise I came up with while experimenting, but I wouldn't say they're quite "ready" yet.

In the longer term it feels like this kind of optimisation is probably necessary to streamline RDO (e.g. using a fast approximate bit counter in some kind of pruning stage)

Thoughts?

Copy link
Collaborator

@vibhoothi vibhoothi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,

I think we could move this under preset >8 or something like that over having this approximation for all the preset. I believe for Speed0 or placebo settings, we should avoid all possible approximations. But I am open to suggestions.

And the AWCY link shared above suggests there is a hit in performance across all the resolution in Obj1-Fast and there is no speedups, is that latest and updated one?

@shssoichiro
Copy link
Collaborator Author

And the AWCY link shared above suggests there is a hit in performance across all the resolution in Obj1-Fast and there is no speedups, is that latest and updated one?

AWCY is generally not trustworthy for its performance calculations, there's too much noise on the servers. My local benchmarks with hyperfine showed a 1-2% overall improvement at speed 2. I can do another run with it at a higher speed like speed 8, it's possible the improvement is larger there.

@vibhoothi
Copy link
Collaborator

vibhoothi commented Jan 22, 2023 via email

@shssoichiro
Copy link
Collaborator Author

shssoichiro commented Jan 22, 2023

Just to be clear, this will give us approx 0.5-1.8% bdrate loss with approx 1-2% speedup?

Yes. As you mentioned, because of that I'd be hesitant about blanket applying it to all speed levels.

@shssoichiro shssoichiro marked this pull request as draft January 22, 2023 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants