Overview

Rate Limiting based on HTTP headers with HAProxy

No Comments

Recently we had a problem with a buggy update to a piece of 3rd party client software. It produced lots and lots of valid, but nonsensical requests, targeting our system.

This post details how we added a dynamic rate limiting to our HAProxy load balancers, heavily throttling only a very specific set of HTTP requests caused by the client bug, while maintaining regular operations for other requests, even on the same URLs.


The files described in this article are available in a GitHub repository for easy access.

Stampede

What made things interesting was that the client software was mostly fine, but a single background sync feature repeatedly (and quite relentlessly) uploaded a tremendous amount of small objects, even though they had already been sent, creating lots of duplicate copies on the backend. At the same time, the interactive portion of the application was working nicely. Moreover, even though the problematic update was distributed to a wide audience, only a certain usage pattern would trigger the bug in a comparatively small portion of installs.

Due to the high frequency of requests coming in with almost no effort client-side, the more heavyweight asynchronous server side processing was not able to keep up, leading to a slowly, but continuously growing queue of outstanding requests.

While fair queueing made sure that most users did not notice much of a slowdown in their regular work with the system at first, it was clear that we needed a way to resolve this situation on our side until a fixed client update could be developed and rolled out.

Options

The most obvious solution would have been to revoke access for the affected OAuth Client ID, but it would also have been the one with the most drastic side-effects. Effectively, the application would have stopped working for all customers, including those who either did not yet have the broken update installed or whose behavior had not triggered the bug. Clearly not a good option.

Another course of action we considered for a short moment was to introduce a rate limit using the Client ID as a discriminator. It would have had the same broad side-effects as locking them out completely, affecting lots of innocent users. Basically anything just taking the Client ID into account would hit more users than necessary.

Implemented Fix

What we came up with is a rate limiting configuration based on the user’s access token instead of the client software, and the specific API call the broken client flooded. While the approach itself is not particularly ingenious, the implementation of the corresponding HAProxy configuration turned out to be a little trickier than anticipated. Most examples are based on the sender’s IP address, however we did not want to punish all users behind the same NATing company firewall as one single offender.

So without further ado here is the relevant snippet from haproxy.cfg:

frontend fe_api_ssl
  bind 192.168.0.1:443 ssl crt /etc/haproxy/ssl/api.pem no-sslv3 ciphers ...
  default_backend be_api
 
  tcp-request inspect-delay 5s
 
  acl document_request path_beg -i /v2/documents
  acl is_upload hdr_beg(Content-Type) -i multipart/form-data
  acl too_many_uploads_by_user sc0_gpc0_rate() gt 100
  acl mark_seen sc0_inc_gpc0 gt 0
 
  stick-table type string size 100k store gpc0_rate(60s)
 
  tcp-request content track-sc0 hdr(Authorization) if METH_POST document_request is_upload
 
  use_backend 429_slow_down if mark_seen too_many_uploads_by_user 
 
backend be_429_slow_down
  timeout tarpit 2s
  errorfile 500 /etc/haproxy/errorfiles/429.http
  http-request tarpit

Let’s go through these in some more detail.

First of all, right after declaring the frontend’s name to be fe_api_ssl we bind the appropriate IP address and port, and set up the TLS settings with the certificate/private key and a set of ciphers (left out for brevity).

Daniel Schneller has been designing and implementing complex software and database systems for more than 15 years and is the author of the MySQL Admin Cookbook. His current job title is Principal Cloud Engineer at CenterDevice GmbH, where he focuses on OpenStack and Ceph based cloud technologies. He has given talks at FroSCon, Data2Day and DWX Developer Week among others.

Share on FacebookGoogle+Share on LinkedInTweet about this on TwitterShare on RedditDigg thisShare on StumbleUpon

Comment

Your email address will not be published. Required fields are marked *