Rate-limit right before allocating
At work last week, I built an API endpoint that allowed logged-out visitors to our website to create rows in a database table. I wanted to prevent bad actors from adding a bunch of nonsense data to the table, so I added rate-limiting based on the user’s IP address. I implemented the endpoint as follows: First, it checked the rate limit and responded with a 429 if the user had surpassed it. If not, it validated the user-provided request data and responded with a 400 if it found invalid data. If both checks were successful, it saved the valid data to the database.
It was only by filling out the HTML form connected to this endpoint that I discovered a design flaw. When I submitted invalid data for the first time, I’d receive a 400. However, after quickly fixing the invalid data and resubmitting it, I’d receive a 429 and wouldn’t be able to submit the now-valid data until the rate limit expired. From a user’s perspective, this was a terrible experience: If they made an error while filling out the form, they’d have to wait several seconds each time they tried to correct it.
I realized that I wanted to limit the number of requests per user that would actually create rows in the database, not the total number of requests. Therefore, I changed the endpoint to validate the request data, then check the rate limit. This way, the endpoint only rate-limits valid requests. Now users can submit many invalid requests in a short period of time while trying different ways to fix the invalid data.
This problem has a mirror image. Suppose you were implementing an endpoint with a high CPU cost and wanted to prevent each user from making too many requests in a short period of time. Further, suppose you implemented the API endpoint to first construct a response to the user’s request, then check the rate limit. In this case, the rate limit has no effect: A user can still make a large number of requests to the endpoint and consume a lot of CPU, even though they receive 429s for most of the requests. The solution is to rate-limit before performing the CPU-intensive operation.
These problems point to a general principle. Rate-limiting prevents one user from using too much of a shared resource, e.g. database space or CPU. The principle is this: Rate-limit right before allocating part of that shared resource to a request. If you check the rate limit too late, the rate-limiting won’t effectively protect the resource. If you check it too early, you’ll end up rate-limiting requests that wouldn’t have consumed the resource anyway, which might be a poor experience for users.
Edit (2020-05-30): A couple of my coworkers pointed out that this principle has tradeoffs. In the first example in this article, I added a direct call to a rate-limiter method to the endpoint. We wouldn't want to do that for every endpoint that adds rows to the database. Instead, we could implement a configurable database rate-limiting system that allows you to specify different rate limits for insertions to different tables. We might want to implement a similar system for each shared resource. Contrast this with a single system that only allows you to rate-limit the number of requests per endpoint. The former allows you to limit bad behaviour more aggressively without compromising user experience, but it might not be worth the extra complexity.
A coworker also mentioned a second use for rate-limiting: preventing dictionary attacks, both on sensitive information, like passwords, and less sensitive information, like tokens that refer to database objects. I think it's possible to treat this as a special case of protecting shared resources by considering the search space as a shared resource, but I haven't fully worked out this metaphor yet.