Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't explicitly pass a default (small) receive size, let trio choose. #185

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

palkeo
Copy link

@palkeo palkeo commented Nov 15, 2023

The default size of 4 KiB is very small, and caused a lot of loops receiving buffers, that were clearly visible in benchmarks.

Also, it's not needed: Optional; if omitted, then the stream object is free to pick a reasonable default.
So maybe let's not invent a redundant default here?

By passing None, it currently means we end up with 64 KiB instead, which sounds more appropriate: https://github.com/python-trio/trio/blob/master/src/trio/_highlevel_socket.py#L21-L25
And it can be hoped that later this will automagically become more intelligent if they implement the TODO.

The default size of 4 KiB is very small, and caused a lot of loops receiving buffers, that were clearly visible in benchmarks.

Also, it's not needed: ` Optional; if omitted, then the stream object is free to pick a reasonable default.`

By passing None, it currently means we end up in 64 KiB instead:
https://github.com/python-trio/trio/blob/master/src/trio/_highlevel_socket.py#L21-L25
And it can be hoped that later this will automagically become more intelligent if they implement the TODO.
@palkeo palkeo requested a review from belm0 November 15, 2023 14:32
@@ -38,7 +38,6 @@
CONN_TIMEOUT = 60 # default connect & disconnect timeout, in seconds
MESSAGE_QUEUE_SIZE = 1
MAX_MESSAGE_SIZE = 2 ** 20 # 1 MiB
RECEIVE_BYTES = 4 * 2 ** 10 # 4 KiB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the attention here. I have several concerns:

  1. 4k to 64k is a huge jump, and it could very well disrupt some existing use of the library in terms of performance, memory use, backpressure behavior, etc. For example, we have a production deployment to 1000's of resource-constrained devices in the field, and this would be the kind of change that needs extensive testing (and my hunch is there would be some problems).
  2. if the new size doesn't work well, library users have no way to change it to something suitable
  3. the comment in trio clearly states that 64k was arbitrary, and starting from a small value and seeing how high you can go is prudent

I'd be interested in trying some modest increases (2x, 4x) on our specific deployment and trying to measure the effect, but I don't have the time available right now. At best, if that looked promising, I'd consider making receive bytes able to be set in the API (including None), but still default to the current value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At best, if that looked promising, I'd consider making receive bytes able to be set in the API (including None), but still default to the current value.

If you have a real app that is measurably benefitting from a larger buffer, that would be a useful datapoint.

Copy link
Author

@palkeo palkeo Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fast answer.

  1. To me 64k still seem reasonably small, if you do care about RAM usage going up by 64k, I feel like you would probably hit RAM issues with Python & garbage collection, etc? Will talk about performance below.
  2. It wasn't configurable before, this stays the same but delegates to trio and avoid duplicating this constant in here & trio. If you prefer to make it configurable I can try making a PR that makes it an argument, and forwards it to trio?

Regarding 3. and performance: I did notice this, because in a benchmark I could see my code spending a lot of time in wsproto routines that this code is calling in a loop after it receives each 4 kB of data.
So I did make it bigger, and it significantly helped with performance. I ended up doing what this PR suggests in production code, and it's faster & consuming less CPU now.

Copy link
Author

@palkeo palkeo Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make some profiling, see this repo. It's a server that repeatedly send packets.

You can run it this way:

python tests/benchmark_server.py 1000

And then you run a client:

/usr/bin/time python tests/benchmark_client.py

Results before PR, for different packet size:

1000: 2.32user 0.05system 0:02.38elapsed 99%CPU (0avgtext+0avgdata 24356maxresident)k
10000: 5.35user 0.23system 0:05.60elapsed 99%CPU (0avgtext+0avgdata 24632maxresident)k
100000: 33.35user 1.53system 0:34.89elapsed 99%CPU (0avgtext+0avgdata 24632maxresident)k

Results after PR, for different packet size:

1000: 1.81user 0.05system 0:01.86elapsed 99%CPU (0avgtext+0avgdata 24320maxresident)k
10000: 2.21user 0.10system 0:02.31elapsed 99%CPU (0avgtext+0avgdata 24640maxresident)k
100000: 4.94user 0.56system 0:05.50elapsed 99%CPU (0avgtext+0avgdata 24640maxresident)k

So with packets of a size 1kB, it's 20% faster. Size 10 kB it's 2x faster. Size 100 kB it's 6x faster.

In terms of memory use, for this minimalistic example that loads nothing except trio+trio-websocket, the max memory consumption increase by like 0.03%, I think this is not really noticeable in the real world.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's faster & consuming less CPU now

How much less? Is your app mostly complete, or is there functionality still to add that will reduce the difference?

I don't think tiny benchmarks can guide selection of this value. And there is still the inertia of existing uses-- we can only speculate about consequences of a 16x increase.

Given the arbitrary values in both trio and trio-websocket, I'm open to plumbing through an option (but still keeping 4k as default).

Copy link
Author

@palkeo palkeo Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much less? Is your app mostly complete, or is there functionality still to add that will reduce the difference?

It's an application that does heavy websocket reading, and it's definitely bottlenecked by this. In my specific case it ~doubled the performance of my app.

What problems do you foresee with a 16x increase in buffer size? To be honest I don't really understand.
Anyway I sent an alternative PR for plumbing it through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants