Don't explicitly pass a default (small) receive size, let trio choose. #185

palkeo · 2023-11-15T14:32:10Z

The default size of 4 KiB is very small, and caused a lot of loops receiving buffers, that were clearly visible in benchmarks.

Also, it's not needed: Optional; if omitted, then the stream object is free to pick a reasonable default.
So maybe let's not invent a redundant default here?

By passing None, it currently means we end up with 64 KiB instead, which sounds more appropriate: https://github.com/python-trio/trio/blob/master/src/trio/_highlevel_socket.py#L21-L25
And it can be hoped that later this will automagically become more intelligent if they implement the TODO.

The default size of 4 KiB is very small, and caused a lot of loops receiving buffers, that were clearly visible in benchmarks. Also, it's not needed: ` Optional; if omitted, then the stream object is free to pick a reasonable default.` By passing None, it currently means we end up in 64 KiB instead: https://github.com/python-trio/trio/blob/master/src/trio/_highlevel_socket.py#L21-L25 And it can be hoped that later this will automagically become more intelligent if they implement the TODO.

belm0 · 2023-11-15T16:23:08Z

trio_websocket/_impl.py

@@ -38,7 +38,6 @@
 CONN_TIMEOUT = 60 # default connect & disconnect timeout, in seconds
 MESSAGE_QUEUE_SIZE = 1
 MAX_MESSAGE_SIZE = 2 ** 20 # 1 MiB
-RECEIVE_BYTES = 4 * 2 ** 10 # 4 KiB


Thank you for the attention here. I have several concerns:

4k to 64k is a huge jump, and it could very well disrupt some existing use of the library in terms of performance, memory use, backpressure behavior, etc. For example, we have a production deployment to 1000's of resource-constrained devices in the field, and this would be the kind of change that needs extensive testing (and my hunch is there would be some problems).

if the new size doesn't work well, library users have no way to change it to something suitable

the comment in trio clearly states that 64k was arbitrary, and starting from a small value and seeing how high you can go is prudent

I'd be interested in trying some modest increases (2x, 4x) on our specific deployment and trying to measure the effect, but I don't have the time available right now. At best, if that looked promising, I'd consider making receive bytes able to be set in the API (including None), but still default to the current value.

At best, if that looked promising, I'd consider making receive bytes able to be set in the API (including None), but still default to the current value.

If you have a real app that is measurably benefitting from a larger buffer, that would be a useful datapoint.

Thanks for the fast answer.

To me 64k still seem reasonably small, if you do care about RAM usage going up by 64k, I feel like you would probably hit RAM issues with Python & garbage collection, etc? Will talk about performance below.

It wasn't configurable before, this stays the same but delegates to trio and avoid duplicating this constant in here & trio. If you prefer to make it configurable I can try making a PR that makes it an argument, and forwards it to trio?

Regarding 3. and performance: I did notice this, because in a benchmark I could see my code spending a lot of time in wsproto routines that this code is calling in a loop after it receives each 4 kB of data.
So I did make it bigger, and it significantly helped with performance. I ended up doing what this PR suggests in production code, and it's faster & consuming less CPU now.

I did make some profiling, see this repo. It's a server that repeatedly send packets.

You can run it this way:

python tests/benchmark_server.py 1000

And then you run a client:

/usr/bin/time python tests/benchmark_client.py

Results before PR, for different packet size:

1000: 2.32user 0.05system 0:02.38elapsed 99%CPU (0avgtext+0avgdata 24356maxresident)k 10000: 5.35user 0.23system 0:05.60elapsed 99%CPU (0avgtext+0avgdata 24632maxresident)k 100000: 33.35user 1.53system 0:34.89elapsed 99%CPU (0avgtext+0avgdata 24632maxresident)k

Results after PR, for different packet size:

1000: 1.81user 0.05system 0:01.86elapsed 99%CPU (0avgtext+0avgdata 24320maxresident)k 10000: 2.21user 0.10system 0:02.31elapsed 99%CPU (0avgtext+0avgdata 24640maxresident)k 100000: 4.94user 0.56system 0:05.50elapsed 99%CPU (0avgtext+0avgdata 24640maxresident)k

So with packets of a size 1kB, it's 20% faster. Size 10 kB it's 2x faster. Size 100 kB it's 6x faster.

In terms of memory use, for this minimalistic example that loads nothing except trio+trio-websocket, the max memory consumption increase by like 0.03%, I think this is not really noticeable in the real world.

it's faster & consuming less CPU now

How much less? Is your app mostly complete, or is there functionality still to add that will reduce the difference?

I don't think tiny benchmarks can guide selection of this value. And there is still the inertia of existing uses-- we can only speculate about consequences of a 16x increase.

Given the arbitrary values in both trio and trio-websocket, I'm open to plumbing through an option (but still keeping 4k as default).

How much less? Is your app mostly complete, or is there functionality still to add that will reduce the difference?

It's an application that does heavy websocket reading, and it's definitely bottlenecked by this. In my specific case it ~doubled the performance of my app.

What problems do you foresee with a 16x increase in buffer size? To be honest I don't really understand.
Anyway I sent an alternative PR for plumbing it through.

palkeo requested a review from belm0 November 15, 2023 14:32

belm0 reviewed Nov 15, 2023

View reviewed changes

palkeo mentioned this pull request Nov 16, 2023

Add the ability to specify the buffer size. #186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't explicitly pass a default (small) receive size, let trio choose. #185

Don't explicitly pass a default (small) receive size, let trio choose. #185

palkeo commented Nov 15, 2023 •

edited

Loading

belm0 Nov 15, 2023

belm0 Nov 15, 2023

palkeo Nov 15, 2023 •

edited

Loading

palkeo Nov 16, 2023 •

edited

Loading

belm0 Nov 16, 2023

palkeo Nov 16, 2023 •

edited

Loading

Don't explicitly pass a default (small) receive size, let trio choose. #185

Are you sure you want to change the base?

Don't explicitly pass a default (small) receive size, let trio choose. #185

Conversation

palkeo commented Nov 15, 2023 • edited Loading

belm0 Nov 15, 2023

Choose a reason for hiding this comment

belm0 Nov 15, 2023

Choose a reason for hiding this comment

palkeo Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

palkeo Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

belm0 Nov 16, 2023

Choose a reason for hiding this comment

palkeo Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

palkeo commented Nov 15, 2023 •

edited

Loading

palkeo Nov 15, 2023 •

edited

Loading

palkeo Nov 16, 2023 •

edited

Loading

palkeo Nov 16, 2023 •

edited

Loading