Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting middleware for AspNet/Kestrel #57152

Open
1 task done
samsp-msft opened this issue Aug 2, 2024 · 0 comments
Open
1 task done

Rate limiting middleware for AspNet/Kestrel #57152

samsp-msft opened this issue Aug 2, 2024 · 0 comments
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions

Comments

@samsp-msft
Copy link
Member

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe the problem.

In experimenting with the performance characteristics of YARP, it became clear that there is a balancing act between how many requests should be handled at once, and the RPS, latency, CPU usage and working set. Kestrel + ASP.NET core by default does not try to throttle the number of requests that are handled at once. What this means is that if the server is hit with a high load, it will keep allocating tasks to the threadpool to handle the incoming requests. It may produce more RPS in average, but at the cost of higher latency for each request, higher CPU and potentially exponential working set increases in memory.

Describe the solution you'd like

We should have a rate limiting component that will limit the number of active requests that are processed by ASP.NET. Requests beyond that cap should be queued and then handled in order unless the request/connection times out. Ideally this throttling is done somewhere between Kestrel and the early stages of ASP.NET so that minimal work is done on each new request. AKA before allocating an HttpContext, doing any form of processing the stream for request line, headers etc.

There should be a way to specify a fixed value for this cap of simultaneous requests, a max queue size after which new requests will be rejected, a max queue duration for each waiting request.

In addition we should have an algorithmic solution that will dynamically adjust the max_simultaneous_requests to balance the value against the CPU usage and working set of the application. I suspect that the latter will be most useful in practice in containerized applications. This can then be set at a value below the OOM kill value so that the process can manage its resources.

I am not a mathematician, but I suspect some form of PID algorithm to control the value of the max_simultaneous_requests to limit the CPU usage and working set.

Additional context

Using the request latency as a variable is probably not practical, because while you could use the time ASP.NET core would take to handle the request, the latency would be affected by the queue duration, so any limitations would be of limited use.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

No branches or pull requests

1 participant