-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't explicitly pass a default (small) receive size, let trio choose. #185
base: master
Are you sure you want to change the base?
Conversation
The default size of 4 KiB is very small, and caused a lot of loops receiving buffers, that were clearly visible in benchmarks. Also, it's not needed: ` Optional; if omitted, then the stream object is free to pick a reasonable default.` By passing None, it currently means we end up in 64 KiB instead: https://github.com/python-trio/trio/blob/master/src/trio/_highlevel_socket.py#L21-L25 And it can be hoped that later this will automagically become more intelligent if they implement the TODO.
@@ -38,7 +38,6 @@ | |||
CONN_TIMEOUT = 60 # default connect & disconnect timeout, in seconds | |||
MESSAGE_QUEUE_SIZE = 1 | |||
MAX_MESSAGE_SIZE = 2 ** 20 # 1 MiB | |||
RECEIVE_BYTES = 4 * 2 ** 10 # 4 KiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the attention here. I have several concerns:
- 4k to 64k is a huge jump, and it could very well disrupt some existing use of the library in terms of performance, memory use, backpressure behavior, etc. For example, we have a production deployment to 1000's of resource-constrained devices in the field, and this would be the kind of change that needs extensive testing (and my hunch is there would be some problems).
- if the new size doesn't work well, library users have no way to change it to something suitable
- the comment in trio clearly states that 64k was arbitrary, and starting from a small value and seeing how high you can go is prudent
I'd be interested in trying some modest increases (2x, 4x) on our specific deployment and trying to measure the effect, but I don't have the time available right now. At best, if that looked promising, I'd consider making receive bytes able to be set in the API (including None), but still default to the current value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At best, if that looked promising, I'd consider making receive bytes able to be set in the API (including None), but still default to the current value.
If you have a real app that is measurably benefitting from a larger buffer, that would be a useful datapoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fast answer.
- To me 64k still seem reasonably small, if you do care about RAM usage going up by 64k, I feel like you would probably hit RAM issues with Python & garbage collection, etc? Will talk about performance below.
- It wasn't configurable before, this stays the same but delegates to trio and avoid duplicating this constant in here & trio. If you prefer to make it configurable I can try making a PR that makes it an argument, and forwards it to trio?
Regarding 3. and performance: I did notice this, because in a benchmark I could see my code spending a lot of time in wsproto routines that this code is calling in a loop after it receives each 4 kB of data.
So I did make it bigger, and it significantly helped with performance. I ended up doing what this PR suggests in production code, and it's faster & consuming less CPU now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did make some profiling, see this repo. It's a server that repeatedly send packets.
You can run it this way:
python tests/benchmark_server.py 1000
And then you run a client:
/usr/bin/time python tests/benchmark_client.py
Results before PR, for different packet size:
1000: 2.32user 0.05system 0:02.38elapsed 99%CPU (0avgtext+0avgdata 24356maxresident)k
10000: 5.35user 0.23system 0:05.60elapsed 99%CPU (0avgtext+0avgdata 24632maxresident)k
100000: 33.35user 1.53system 0:34.89elapsed 99%CPU (0avgtext+0avgdata 24632maxresident)k
Results after PR, for different packet size:
1000: 1.81user 0.05system 0:01.86elapsed 99%CPU (0avgtext+0avgdata 24320maxresident)k
10000: 2.21user 0.10system 0:02.31elapsed 99%CPU (0avgtext+0avgdata 24640maxresident)k
100000: 4.94user 0.56system 0:05.50elapsed 99%CPU (0avgtext+0avgdata 24640maxresident)k
So with packets of a size 1kB, it's 20% faster. Size 10 kB it's 2x faster. Size 100 kB it's 6x faster.
In terms of memory use, for this minimalistic example that loads nothing except trio+trio-websocket, the max memory consumption increase by like 0.03%, I think this is not really noticeable in the real world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's faster & consuming less CPU now
How much less? Is your app mostly complete, or is there functionality still to add that will reduce the difference?
I don't think tiny benchmarks can guide selection of this value. And there is still the inertia of existing uses-- we can only speculate about consequences of a 16x increase.
Given the arbitrary values in both trio and trio-websocket, I'm open to plumbing through an option (but still keeping 4k as default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much less? Is your app mostly complete, or is there functionality still to add that will reduce the difference?
It's an application that does heavy websocket reading, and it's definitely bottlenecked by this. In my specific case it ~doubled the performance of my app.
What problems do you foresee with a 16x increase in buffer size? To be honest I don't really understand.
Anyway I sent an alternative PR for plumbing it through.
The default size of 4 KiB is very small, and caused a lot of loops receiving buffers, that were clearly visible in benchmarks.
Also, it's not needed:
Optional; if omitted, then the stream object is free to pick a reasonable default.
So maybe let's not invent a redundant default here?
By passing None, it currently means we end up with 64 KiB instead, which sounds more appropriate: https://github.com/python-trio/trio/blob/master/src/trio/_highlevel_socket.py#L21-L25
And it can be hoped that later this will automagically become more intelligent if they implement the TODO.