feat(http-ratelimiting): add a bucket for global rate limits #2159

greaka · 2023-02-19T18:32:26Z

This closes #650.

I opted to keep the rate limit trait as is for now, even though is_globally_locked is unused.
Additionally, the global rate limiter waits for each request to return before accepting the next request. Since 50/s isn't that many, I wanted to keep it simple. If you disagree, then I would change that.
Lastly, Discord only documents the hardcoded 50 requests per second. Nirn-proxy however seemed to have figured out that big bots have different global rate limits. ~~We discussed this on Discord and decided not to support this due to it not being officially documented.~~ EDIT: It was requested to add the option to instead specify custom ratelimit values manually.

vilgotf · 2023-02-19T18:55:46Z

Let's ignore the failing http tests for now.

zeylahellyer

Looks great overall, thank you! I've got concerns about the current implementation of the global bucket loop being a bottleneck, and I think fixing that concern can still cause ratelimit headers to be reached. I've added some scenarios where I think this implementation can cause global ratelimits to be encountered, and I've proposed a solution to prevents that possibility.

zeylahellyer · 2023-02-25T16:26:28Z

twilight-http-ratelimiting/src/in_memory/mod.rs


 use self::bucket::{Bucket, BucketQueueTask};
+pub use self::global_bucket::GlobalBucket;


Should this be public? I don't see it being able to be returned or used by this ratelimiter implementation. If this is something we expect people to use for building other ratelimiters then we should move this outside of the in_memory module to the root.

zeylahellyer · 2023-02-25T16:56:38Z

twilight-http-ratelimiting/src/in_memory/bucket.rs

+            let global_ticket_tx = if let Ok(ticket_tx) = self.wait_for_global().await {
+                ticket_tx
+            } else {
+                continue;


It seems like this shouldn't happen. The global bucket lives at least as long as this task since each queue has a strong reference to it.

But, let's say a panic happens in the task driving the global bucket, such as an arithmetic panic (eg divide by 0) or a panic in some underlying dependency, such as an undocumented panic when calling tokio::time:sleep. When that happens, wait_for_bucket will only ever return an error since there's no receiving end to send a TicketSender back. When this happens, we continue the loop, and the cycle starts again. All requests to the ratelimiter from outside parties will start to just receive a mysterious error that the ratelimiter dropped the request, since they're never getting the sending channel back (by calling queue_tx.available() here) and the sending half of the channel gets dropped.

This is kind of not great, so we have two options: explicitly drop the queue and "stop" the ratelimiter and return an appropriate error to fail all requests, or allow all requests to pass through without a global ratelimiter. The second is more shocking, so we should probably default to that, but make the behavior customizable as a method on the ratelimiter.

zeylahellyer · 2023-02-25T17:28:54Z

twilight-http-ratelimiting/src/in_memory/global_bucket.rs

+async fn run_global_queue_task(bucket: Arc<InnerGlobalBucket>, period: u64, requests: u32) {
+    let mut time = Instant::now();
+
+    while let Some(queue_tx) = bucket.queue.pop().await {
+        wait_if_needed(bucket.as_ref(), &mut time, period, requests).await;
+
+        let ticket_headers = if let Some(ticket_headers) = queue_tx.available() {
+            ticket_headers
+        } else {
+            continue;
+        };
+
+        if let Ok(Some(RatelimitHeaders::Global(headers))) = ticket_headers.await {
+            tracing::debug!(seconds = headers.retry_after(), "globally ratelimited");
+
+            bucket.is_locked.store(true, Ordering::Release);
+            tokio::time::sleep(Duration::from_secs(headers.retry_after())).await;
+            bucket.is_locked.store(false, Ordering::Release);
+        }
+    }
+}


Let's take a look at the flow of the logic here:

global queue task waits for an item in the queue, which happens on every ratelimiter request

the global ratelimiter waits if needed

the global queue notifies the other half they can send the request
a. if the other half is still around, the global queue receives a receiving half to receive headers from
b. otherwise, the global queue can know the other half doesn't exist and loops over to the next request

the global queue waits for the headers from the other half; if the headers are of a global ratelimit then it sleeps, causing the entire system to stall, as intended

it loops over

In terms of preventing more than a configured number of requests in a given period, this system of preemptive sleeping works great. The problem here is that this is synchronous: only one request will be processed at a time. Practically speaking, this means the global bucket won't limit to, say, 50 requests/1 second, but rather to a much smaller number of requests because the global bucket is a bottleneck between each request. Requests to ping Discord take an average latency of 102ms right now according to the status page, so functionally the global bucket will bottleneck users to < 10 requests/second, if not much fewer depending on the routes in use.

We need to change the implementation here to solve this. First instinct is that hitting a global ratelimit header with this system is unlikely with the preemptive ratelimiter in place, so what we could do instead is spawn a task for each set of headers while waiting. With a mutex in place instead of the is_locked boolean, we could lock up the ratelimiter for the duration. That works.

But this has its own flaw: if a user makes, say, 100 requests to the ratelimiter with a global bucket of 50 requests/1 second, then across just over 1 second they'll receive all 100 request allotments. If requests don't all take the same time (very likely), or they get stalled, or something else, then users could fire off more than 50 requests in a given second and still hit the global ratelimit. It's also impossible to determine when Discord will count against a request against a given second. We can see how this could fail under this circumstance:

User requests 100 ratelimiter tickets for requests, current configuration is 50 requests/second

50 tickets are granted in the first second

40 are processed by Discord in that second

50 tickets are granted in the second second

those 50, plus the 10 from the previous second, are processed by Discord

that was 60 in a second -- oops, global ratelimit hit

The solution I see is to simultaneously limit the number of requests allotted in a given second and limit the number of in-flight requests to the configured number of requests. We can see why this prevents the previous scenario:

User requests 100 ratelimiter tickets for requests, current configuration is 50 requests/second

First second

50 tickets are granted in the first second and their requests are sent to Discord

40 are processed by Discord in that second

30 requests get responses and are processed by the global queue

Second.. second

Because 20 requests are still in flight and their global ratelimit status is indeterminate, 30 tickets are granted

the responses of the 10 requests that were processed by Discord but didn't receive responses are received

the 10 requests that weren't processed by Discord last second are processed and return responses

20 of the requests from this second are received

10 are not, so there are 10 requests still in flight

Third second

because 10 requests are in flight, there are 40 tickets allotted for the second

the cycle continues

it's mathematically and logically impossible for a global ratelimit

To accomplish this, my suggestion is we use a combination of tracking the time (like now), spawning tasks to process responses, and a Tokio semaphore with the configured number of requests as allotments. When one second passes over into another, the first processed request of that second can add allotments for the number of responses that were received last second. If you want tips on how to implement this let me know and I can provide them, it's a bit complicated.

Let me know if I'm thinking about this wrong, but this implementation of using a dual timer and semaphore seems like the very correct way of doing this.

The current time tracker is inefficient, wait_if_needed will ~always sleep and yield one request/(period/requests) and therefore evenly space them out. We could either use a leaky bucket algorithm (we can take a dependency upon a crate implementing it) or mirror the implementation of twilight_gateway::CommandRatelmitier. The ratelimiter in the gateway was quite tricky, so I'd recommend adding lots of tests for this module.

Are you sure that Discord resets every second opposed to evenly allowing more? In that case this way would allow requests to complete earlier.

Are you sure that Discord resets every second opposed to evenly allowing more? In that case this way would allow requests to complete earlier.

I misremembered, thanks for the catch. I don't believe the docs explicitly state this, but it is a GCRA implementation, so it's actually a leaky bucket that will replenish over time.

zeylahellyer · 2023-02-25T17:29:33Z

twilight-http-ratelimiting/src/in_memory/bucket.rs

+                ticket_tx
+            } else {
+                continue;
+            };


let-else once we upgrade to rust 1.65 🙏

vilgotf · 2023-02-26T19:06:34Z

twilight-http-ratelimiting/src/in_memory/bucket.rs

        let mut rx = self.rx.lock().await;

-        timeout(timeout_duration, rx.recv()).await.ok().flatten()
+        rx.recv().await


I am somewhat certain that this will never return None (as atleast one sender is tied to each BucketQueue), which is otherwise necessary for the bucket to be finished and freed (see BucketQueueTask::run).

…e limits

…orrectly

github-actions bot added c-http-ratelimiting Affects the http ratelimiting crate t-feature Addition of a new feature labels Feb 19, 2023

vilgotf self-requested a review February 19, 2023 18:50

zeylahellyer reviewed Feb 25, 2023

View reviewed changes

vilgotf reviewed Feb 26, 2023

View reviewed changes

greaka added 5 commits February 26, 2023 21:21

feat(http-ratelimiting): add a bucket for global rate limits

e013f33

fix(http-ratelimiting): address clippy issues

0f33f85

feat(http-ratelimiting): add the ability to specify custom global rat…

ff9a4e7

…e limits

feat(http-ratelimiting): allow multiple requests at once

9d2400f

fix(http-ratelimiting): global rate limit penalties are now applied c…

0ec96cd

…orrectly

greaka force-pushed the feat/global-ratelimit-bucket branch from cd8ad2c to 0ec96cd Compare February 26, 2023 20:22

vilgotf force-pushed the main branch from 7c922ba to 2841517 Compare August 14, 2023 06:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(http-ratelimiting): add a bucket for global rate limits #2159

feat(http-ratelimiting): add a bucket for global rate limits #2159

greaka commented Feb 19, 2023 •

edited

Loading

vilgotf commented Feb 19, 2023

zeylahellyer left a comment

zeylahellyer Feb 25, 2023

zeylahellyer Feb 25, 2023

zeylahellyer Feb 25, 2023 •

edited

Loading

vilgotf Feb 25, 2023

greaka Feb 25, 2023

zeylahellyer Feb 25, 2023 •

edited

Loading

zeylahellyer Feb 25, 2023

vilgotf Feb 26, 2023


		use self::bucket::{Bucket, BucketQueueTask};
		pub use self::global_bucket::GlobalBucket;

feat(http-ratelimiting): add a bucket for global rate limits #2159

Are you sure you want to change the base?

feat(http-ratelimiting): add a bucket for global rate limits #2159

Conversation

greaka commented Feb 19, 2023 • edited Loading

vilgotf commented Feb 19, 2023

zeylahellyer left a comment

Choose a reason for hiding this comment

zeylahellyer Feb 25, 2023

Choose a reason for hiding this comment

zeylahellyer Feb 25, 2023

Choose a reason for hiding this comment

zeylahellyer Feb 25, 2023 • edited Loading

Choose a reason for hiding this comment

vilgotf Feb 25, 2023

Choose a reason for hiding this comment

greaka Feb 25, 2023

Choose a reason for hiding this comment

zeylahellyer Feb 25, 2023 • edited Loading

Choose a reason for hiding this comment

zeylahellyer Feb 25, 2023

Choose a reason for hiding this comment

vilgotf Feb 26, 2023

Choose a reason for hiding this comment

greaka commented Feb 19, 2023 •

edited

Loading

zeylahellyer Feb 25, 2023 •

edited

Loading

zeylahellyer Feb 25, 2023 •

edited

Loading