-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SmallVec to optimize for small integer sizes. #210
base: master
Are you sure you want to change the base?
Conversation
This PR is the same as the previous one with the exception of a few optimisations. The optimisations weren't necessary; they just proved that the |
Oh, and I added back the 'shr' optimisation because the regression was particularly egregious. |
@cuviper Thoughts? |
How much of a difference does the union make? I prefer that conceptually because we don't increase the size that way, but in past experiments ISTR it had mixed impact on performance overall. If the union performance is good, then I think I'd prefer having a feature that toggles between |
The performance difference between union and no-union is minor and is dwarfed by the performance boost from avoiding heap allocations. I'll update the PR to include a feature switch for |
Should this PR be marked as "Closes #36"? |
This is a proof of concept. I'd like to discuss possible designs and pathways to getting something like this merged.
Motivation
Using
BigInt
with mostly small (<= word-sized) integers is prohibitively expensive. One numerically intense algorithm from rgeometry is 1000 times slower when usingBigRational
instead of a fixed-precision number.Performance
num-bigint
performs quite well for large input and is competitive with the GMP. Detailed benchmarks can be found here: https://github.com/Lemmih/num-criterionSo why do small integers perform so poorly? Mostly because allocating new memory is significantly more expensive than doing a single addition or multiplication. There are also a few cases where we can use specialized algorithms. For example, there's a particularly fast
gcd
algorithm that only works on word-sized integers.SmallVec with two inlined BigDigits
BigUint
is aVec<BigDigit>
. As such, it has a size of 3 words in addition to the actual digits. Switching to aSmallVec<[BigDigit; 2]>
will not increase the size of aBigUint
but allows for two BigDigits to be inlined and thus not require a heap allocation.Summary of benchmarks:
I've run benchmarks on an M1 Apple MacBook Air and a 3950X AMD desktop. The results I get from these platforms are quite different and I'd love it if other people could help out by running the benchmark on their machines. For instructions, see: https://github.com/Lemmih/num-criterion
shr_assign
. The previous version interacted poorly withSmallVec
.Other optimizations
Specialize for integers that fit in 128 bits.
Ideally, addition, multiplication, and division for integers that fit in 128 bits would be as fast as cloning. Copying the bytes is significantly slower than doing the arithmetic.
Use better GCD implementation.
The GCD implementation in
num-integer
is fairly slow and can be significantly improved by changing a line or two. On my system, this leads to a 10x speed improvement for BigInts when the integer fits in 64 bits.Other oddities
num-bigint
? Aren't we all just calling 'memcpy'?