Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of IORING_REGISTER_PBUF_RING, introduced in kernel 5.19 #112

Open
FrankReh opened this issue Sep 14, 2022 · 7 comments
Open

Use of IORING_REGISTER_PBUF_RING, introduced in kernel 5.19 #112

FrankReh opened this issue Sep 14, 2022 · 7 comments
Labels

Comments

@FrankReh
Copy link
Collaborator

FrankReh commented Sep 14, 2022

An issue to track this crate's use of the register op IORING_REGISTER_PBUF_RING.

First described here io_uring: add support for ring mapped supplied buffers

@FrankReh
Copy link
Collaborator Author

Only today I started to appreciate the value of the buf_ring feature added in 5.19. That was introduced in the kernel archives in April this year and perhaps I had been reading older man pages to begin with. I had already been wondering how we would want to compact the information when we are replenishing fixed buffers. Turns out the io_uring author had already worked on it earlier this year.

I would propose this crate prepare to jump right to this use of fixed buffers, using the pbuf_ring mechanism (not use the provide-buffers mechanism), even though it will only be available for kernels 5.19+. This crate doesn't need to support fixed buffers on older kernels IMHO as they would be much more work, and their overhead large enough they aren't clearly a win for users.

@FrankReh FrankReh mentioned this issue Sep 14, 2022
@FrankReh
Copy link
Collaborator Author

The pbuf_ring mechanism seems to be the fourth form of buffer management provided by the kernel io_uring driver (as of 5.19). (Not even counting the mechanism where the caller provides their own buffer for each operation - that could be thought of a mechanism 0.)

IORING_REGISTER_BUFFERS for use with two operators only IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED.

IORING_OP_PROVIDE_BUFFERS came in kernel 5.7 and allowed the creation of buffer groups that could then be referenced with various read and recv operations. But replenishing the provided buffers was not considered easy or efficient by April of 2022.

IORING_REGISTER_BUFFERS2 introduced in kernel 5.13. Seems useful only for the same two operations as listed above: IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED.

IORING_REGISTER_PBUF_RING introduced in kernel 5.19. Provides for buffers per group id, like 2) but with a much more efficient mechanism of replenishing buffers. The link at the top of this issue provides the benchmarks showing how much more efficient this is compared with 2) for the system as a whole. And it is easier for the user-space caller as well. Rings of buffers to be used with the rings of submissions and completions.

@FrankReh
Copy link
Collaborator Author

As another reference, here is the liburing UDP example using the io_uring_buf_ring mechanism. https://github.com/axboe/liburing/blob/master/examples/io_uring-udp.c

It also happens to use the new io_uring_prep_recvmsg_multishot (from kernel 6.0).

Should make a reasonable target example for supporting with this crate's mechanisms too. First we'll need the io-uring bindings to catch up to 5.19 so the buffering can be worked out. 6.0 should be out early next month.

@FrankReh
Copy link
Collaborator Author

And here is the kernel URL for seeing all archived emails about buf_ring.

https://lore.kernel.org/io-uring/?q=buf_ring

@SUPERCILEX
Copy link
Contributor

cc @quininer @lucab

I'd like to propose the rough sketch of an implementation:

struct PreAllocatedBufRing {
  num_bufs: usize,
  buf_size: usize,
}

fn register_buf_ring(ring_entries: u16, bgid: u16, pre_allocations: Into<Option<PreAllocatedRingBufs>>) -> BufRing

impl BufRing {
  fn submissions(uring: &mut IoUring) -> BufRingSubmissions
}

impl BufRingSubmissions {
  unsafe fn get(entry) -> Buf // Safety: the buffer group has to match
  fn add(buf: Buf)
  fn sync() // Drop will automatically sync
  fn completions() -> CompletionsQueue // Non-atomically update the tail
}

Ideas:

  • register_buf_ring is renamed to register_buf_ring_unchecked for people who want to do manual memory management (maybe they have their own allocations already, etc).
  • PreAllocatedRingBufs is used to decide how big the mmap should be (users can manually allocate their own bufs or pick some number of bufs to include in the mmap).
  • I haven't figured out what the Buf API is supposed to look like yet. It all seems very unpleasant unfortunately. I think BufRing can include an API to return a mutable slice of MaybeUninit for the extra space. Buf can include an unsafe API that accepts a mut slice of uninits as the pointer. That way if you use pre allocation, you can just do a chunks_mut to get a bunch of slices and if you go manually, you can use Vec::spare_capacity_mut. Hmmm, actually it'll be unsound to return a slice to the buffers, so I think the BufRing api will have to return a pointer + len and then we'll tell users to turn it into a slice before telling io_uring about the buffers. Forcing a mut slice seems like a good way to enforce ownership.

Anyway, this is all pretty rough. Unless you guys have suggestions, I'm going to start implementing this in tandem with my server so I can see if the idea is a good fit over the next few weeks. Then I'll try to upstream.

@SUPERCILEX
Copy link
Contributor

Not doing the completions API as that's broken: axboe/liburing#1039

@SUPERCILEX
Copy link
Contributor

Created tokio-rs/io-uring#256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants