Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitcode rewrite #19

Merged
merged 45 commits into from
Mar 16, 2024
Merged

Bitcode rewrite #19

merged 45 commits into from
Mar 16, 2024

Conversation

caibear
Copy link
Member

@caibear caibear commented Feb 22, 2024

I rewrote the entire library. docs
It's been tested/fuzzed but still needs some work before release.

# If you want to try it early
bitcode = "=0.6.0-beta.1"

New features:

  • much faster
  • very compressible
  • doesn't require any hints (determines them at runtime)
  • deserialize &str

Alpha release:

  • big endian platforms (compile error right now)
  • signed integer size reduction (currently treated as unsigned)
  • usize
  • Result

Beta release:

  • handle derive macro errors instead of panicking
  • remove lifetime bound on DecodeBuffer
  • fix documented unsound code in serde impl

Full release:

  • make bitcode::Buffer Send + Sync
  • recursive types
  • #[bitcode(with_serde)] (can only do bitcode::serialize right now)
  • #![forbid(unsafe_code)] feature flag (serde only and slightly slower)
  • CString
  • IpAddr (see Add support for std::net::{*Addr*} #30)
  • deserialize &[u8]

@caibear caibear marked this pull request as draft February 22, 2024 22:17
@caibear
Copy link
Member Author

caibear commented Feb 22, 2024

Benchmarks for those who are interested. Previous version of bitcode isn't shown here but it has speed similar to bincode, size simliar to new bitcode, and compressed size 20% worse than bincode.

Format Compression Size (bytes) Serialize (ns) Deserialize (ns)
bincode 49.1 35 115
bincode lz4 16.1 86 115
bincode deflate-fast 13.1 166 176
bincode deflate-best 8.9 3708 141
bincode zstd-0 12.4 172 146
bincode zstd-22 8.5 32312 133
bincode-varint 22.3 36 116
bincode-varint lz4 10.8 71 119
bincode-varint deflate-fast 10.1 146 165
bincode-varint deflate-best 8.0 2664 153
bincode-varint zstd-0 8.2 123 131
bincode-varint zstd-22 7.8 23939 136
bitcode 16.9 29 104
bitcode lz4 9.8 42 108
bitcode deflate-fast 8.3 86 137
bitcode deflate-best 6.8 1661 125
bitcode zstd-0 7.1 58 115
bitcode zstd-22 6.2 24713 115
bitcode-derive 16.9 10 12
bitcode-derive lz4 9.7 25 17
bitcode-derive deflate-fast 8.3 68 45
bitcode-derive deflate-best 6.8 1597 33
bitcode-derive zstd-0 7.1 40 24
bitcode-derive zstd-22 6.2 24905 25

@caibear caibear changed the title Draft: Bitcode rewrite Bitcode rewrite Feb 23, 2024
@tbillington
Copy link

doesn't require any hints (determines them at runtime)

This is remarkable, is there any particular part of the branch you could point me to so I could see how it's done?

Side question, do you see any benefits in hinting on top of runtime determined?

Hope it's okay to ask the questions in your PR :) thanks for making this library

@caibear
Copy link
Member Author

caibear commented Feb 26, 2024

doesn't require any hints (determines them at runtime)

This is remarkable, is there any particular part of the branch you could point me to so I could see how it's done?

https://github.com/SoftbearStudios/bitcode/blob/5bdc22ba943d0ba8de092a763327b8167656611f/src/pack.rs
https://github.com/SoftbearStudios/bitcode/blob/5bdc22ba943d0ba8de092a763327b8167656611f/src/pack_ints.rs
https://github.com/SoftbearStudios/bitcode/blob/5bdc22ba943d0ba8de092a763327b8167656611f/src/f32.rs (albiet this one requires the aid of a compression algorithm like deflate/lz4/zstd to do anything)

Side question, do you see any benefits in hinting on top of runtime determined?

After adding hints to bitcode, I didn't use them as much as I thought I would because it was tedious. The types of "packing" I'm using in this new version are designed to quickly determine if they're applicable and pack the data. I'm probably not going to add manual hints back because most people don't benefit from them (me included). Also this new version prioritizes working with general purpose compression which some of the old hints got in the way of (e.g. #[bitcode_hint(ascii)] made characters 7 bits which confused byte-wise compression algorithms).

Hope it's okay to ask the questions in your PR :) thanks for making this library

Yeah! I made this PR public before being finished to see what people think about it.

@LevitatingBusinessMan
Copy link

Very impressive!!!

@caibear
Copy link
Member Author

caibear commented Feb 28, 2024

I just released bitcode = "=0.6.0-alpha.1" which has the core features done.

@caibear
Copy link
Member Author

caibear commented Mar 12, 2024

I'm interested in hearing opinions about making bitcode::encode/bitcode::decode use a thread local bitcode::Buffer.
This is how we internally use bitcode so I'm considering upstreaming it.

thread_local! {
    static BUFFER: std::cell::RefCell<Buffer> = Default::default();
}
pub fn encode<T: Encode + ?Sized>(t: &T) -> Vec<u8> {
    BUFFER.with(|b| b.borrow_mut().encode(t).to_vec())
}
pub fn decode<'a, T: Decode<'a> + ?Sized>(bytes: &'a [u8]) -> Result<T, Error> {
    BUFFER.with(|b| b.borrow_mut().decode(bytes))
}

Pros:

  • Small messages encode/decode 2x faster
  • Large messages encode 20% faster and decode 10% faster

Cons:

  • Keeps large amounts of memory when encoding/decoding a single large message
  • Unexpected behavior

You could always opt out by creating a new buffer for each encode/decode call.

@caibear
Copy link
Member Author

caibear commented Mar 14, 2024

I just released bitcode = "=0.6.0-beta.1".

  • reimplement bitcode::Buffer
  • fix unsound code in serde impl
  • optimize bitcode::serialize and bitcode::deserialize by 30-40%

@caibear caibear marked this pull request as ready for review March 16, 2024 04:08
@caibear caibear merged commit 431b88f into main Mar 16, 2024
1 check passed
@caibear caibear deleted the bitcode_rewrite branch March 16, 2024 06:48
@jestarray
Copy link

@caibear https://docs.rs/bitcode/0.5.0/bitcode/struct.Buffer.html#method.deserialize
So buffer.deserialize() is now removed? I'm guessing if I want to deserialize with reusable buffer, I have to use encode/decode?

@caibear
Copy link
Member Author

caibear commented Apr 26, 2024

@caibear https://docs.rs/bitcode/0.5.0/bitcode/struct.Buffer.html#method.deserialize So buffer.deserialize() is now removed? I'm guessing if I want to deserialize with reusable buffer, I have to use encode/decode?

Yes this was removed. Currently you have to use Encode/Decode if you want to reuse allocations.
Buffer::{serialize, deserialize} might be reimplemented in a future version, but doing so is non-trivial.

Note: Saving allocations is an optimization that's usually 10% faster on large messages and 50% faster on small messages.
If you care about speed, you probably want to use Encode/Decode anyway because they're usually 2-5x faster.

@jestarray
Copy link

@caibear Ahh I see. I'm using hecs(ecs) and other libs and would need to derive Encode/Decode on them as well so I'll stick with 0.5.0 in the meantime.

@caibear
Copy link
Member Author

caibear commented Apr 26, 2024

@caibear Ahh I see. I'm using hecs(ecs) and other libs and would need to derive Encode/Decode on them as well so I'll stick with 0.5.0 in the meantime.

Have you benchmarked 0.6 with allocations against 0.5 without allocations for your usecase? 0.6 with allocations might be faster if your messages are large enough.

@jestarray
Copy link

jestarray commented Apr 26, 2024

@caibear Wow, 0.6 is really that better huh? I have not benchmarked just yet. What do you mean by "large enough" in terms of size? My game sends position updates 30 times a sec and the average size is ~3-5kilobytes

@caibear
Copy link
Member Author

caibear commented Apr 27, 2024

@caibear Wow, 0.6 is really that better huh? I have not benchmarked just yet. What do you mean by "large enough" in terms of size? My game sends position updates 30 times a sec and the average size is ~3-5kilobytes

0.6 is generally faster/smaller than 0.5 across all benchmarks. The question here is if the gain in speed outweighs the additional allocations. I just benchmarked deserializing 5kb of messages and 0.6 is 30% faster. I don't know your exact structs, but this should be a good baseline.

On a side note: I also benchmarked 0.6 derive and it's 7x faster than 0.6 serde.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants