Design for an efficient mechanism for I/O reads with Polyphony #115

noteflakes · 2023-07-30T06:18:56Z

noteflakes
Jul 30, 2023
Maintainer

A problem with I/O read operations in Ruby is that the stock API uses Strings as buffers. Strings are not ideal, because they need to be allocated, and in order to be able to reuse them, you need to construct some mechanism for managing their use. Another problem is that concatenating strings is in in many situations a costly operation.

The problem is accentuated when one needs to read lines, for example, in an HTTP/1 parser, especially if we're dealing with a slow HTTP client. Creating a string in order to read into it, then concatenating it to a previously read string, and later cutting into smaller strings represents a duplication of effort both in terms of memory allocations and CPU usage.

This design document describes an alternative way of performing I/O read/recv operations, that will be integrated into Polyphony and will achieve the following goals:

Minimize copying of strings.
Minimize memory allocations.
Optimize performance for read operations both in the backend and in high-level APIs.
Allow reuse of buffers across fibers and I/Os.
Allow eventual support for io_uring buffer rings and multishot recv.
Facilitate the development of protocol parsers.

Basic design

The present proposal aims to achieve the above goals by adding two mechanisms to Polyphony:

IOStream - an object that reads data upon demand from its associated IO to a set of buffers that can be then be used to perform the different kinds of read operations demanded by user code.
A global buffer manager that allocates reusable buffers on demand.

The buffers are turned into strings only at the IOStream user-code interface.

IOStream

The IOStream class encapsulates a stream that can be read from. It implements all the different ways an IO can be read from, i.e. getc, gets, read, readpartial etc. The IO instance will automatically instantiate an IOStream and delegate all of its read method calls to it.

An IOStream instance references zero or more buffer entries and maintains a cursor that marks the current 4reading position. Upon calling any of its read methods, if no data is available to be read, the IOStream issues a call to Backend#stream_read (or #stream_recv) and gets back a buffer entry (which can be a buffer or an EOF marker). Once the call returns, the IOStream can resume reading. For some types of reads, e.g. gets or read(len), if the data available does not allow terminating the read (for example, a newline character has not been found in the available buffer entries), the IOStream will issue subsequent read operations to the underlying backend, until the read request can be sasisfied, or an EOF is encountered.

When the cursor position moves past an entry, the underlying buffer is released back to the buffer manager.

The IOStream class can also be used as a transport-agnostic buffer layer that allows reading from any source, as discussed here.

Buffer management

Buffers are allocated in pools according to size. Buffer sizes range from 4KB to 4GB, and the requested buffer sizes are rounded up to the nearest power of two. A single buffer manager is tasked with managing buffers across all threads and backends. The buffer manager maintains a free list for each of the power-of-two sizes, implemented as doubly linked lists. When a buffer is needed, the manager removes the head of the appropriate free list (according to the requested size), and returns it. If no buffer is available, the manager allocates it.

io_uring provided buffers and buffer rings

This design is compatible with the way io_uring allows providing pre-allocated buffers and the newer buffer ring feature. We provide buffers to the io_uring interface, and when a CQE arrives, it will contain a reference to the buffer that was used. We will need, however, to add support for buffer group ids to the buffer manager, and we'll also need to add a buffer group id field to struct op_ctx.

API

In addition to supporting the usual read methods, we'll want to add some methods for parsers. Some examples:

def parse_request_header
  return false if @stream.peek_compare(:crlf_lf)

  key = @stream.read_until(/\:\s*/, invalid: "\r\n", min_length: 1, max_length: MAX_HEADER_KEY_LENGTH, consume_delimiter: true)
  raise H1P::Error, "Invalid header key" if !key

  value = @stream.read_until(:crlf_lf, min_length: 1, max_length: MAX_HEADER_VALUE_LENGTH, consume_delimiter: true)
  raise H1P::Error, "Invalid header key" if !value

  @headers[key.downcase] = value
end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Digital Fabric

Design for an efficient mechanism for I/O reads with Polyphony #115

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Digital Fabric

Design for an efficient mechanism for I/O reads with Polyphony #115

noteflakes Jul 30, 2023 Maintainer

Basic design

IOStream

Buffer management

io_uring provided buffers and buffer rings

API

Replies: 0 comments

noteflakes
Jul 30, 2023
Maintainer