Receiving BADREQ 400 errors on HaProxy?

4

I have been getting a sizeable number of BADREQ entries in my HAProxy log (10+ per minute).

Example:

Oct 1 19:46:00 LB haproxy[19022]: 69.171.251.8:57356 [01/Oct/2018:19:46:00.903] sitename sitename/ -1/-1/-1/-1/5 400 187 - - PRNN 19/19/0/0/5 0/0 ""

Almost all of them seem to be coming from Facebook's crawler.

The crawler seems to be scraping content just fine for the most part. However, a small number of these requests are resulting in BADREQ errors.

As suggested elsewhere, I used socat to see the last error by running the following command:

sudo echo "show errors" | sudo socat unix-connect:/var/run/haproxy.stat stdio

This gave me the following output:

  invalid request
  backend mysite (#2), server <NONE> (#-1), event #127
  src 69.171.251.1:61042, session #9717, session flags 0x00000080
  HTTP msg state 26, msg flags 0x00000000, tx flags 0x00000000
  HTTP chunk len 0 bytes, HTTP body len 0 bytes
  buffer flags 0x00808002, out 0 bytes, total 517 bytes
  pending 517 bytes, wrapping at 32776, error at position 0:

  00000  \x16\x03\x01\x02\x00\x01\x00\x01\xFC\x03\x03 B\x9B\xF8\xAE\xFB=\xD7dN
  00021+ \x8D\xAD\xCCP\x99\x9C\xEEow#w\n
  00033  \xB5\x99\x16g@\x1F{\x9A5H\x00\x00\xAA\xC00\xC0,\xC0(\xC0$\xC0\x14\xC0
  00057+ \n
  00058  \x00\xA5\x00\xA3\x00\xA1\x00\x9F\x00k\x00j\x00i\x00h\x009\x008\x007
  00080+ \x006\xCC\xA9\xCC\xA8\xCC\x14\xCC\x13\xCC\xAA\xCC\x15\x00\x88\x00\x87
  00098+ \x00\x86\x00\x85\xC02\xC0.\xC0*\xC0&\xC0\x0F\xC0\x05\x00\x9D\x00=\x005
  00120+ \x00\x84\xC0/\xC0+\xC0'\xC0#\xC0\x13\xC0\t\x00\xA4\x00\xA2\x00\xA0\x00
  00141+ \x9E\x00g\x00@\x00?\x00>\x003\x002\x001\x000\x00\x9A\x00\x99\x00\x98
  00164+ \x00\x97\x00E\x00D\x00C\x00B\xC01\xC0-\xC0)\xC0%\xC0\x0E\xC0\x04\x00
  00187+ \x9C\x00<\x00/\x00\x96\x00A\xC0\x12\xC0\x08\x00\x16\x00\x13\x00\x10
  00206+ \x00\r\xC0\r\xC0\x03\x00\n
  00214  \x00\xFF\x01\x00\x01)\x00\x00\x00\x14\x00\x12\x00\x00\x0Ffb.mysite.c
  00242+ om\x00\x0B\x00\x04\x03\x00\x01\x02\x00\n
  00254  \x00\x1C\x00\x1A\x00\x17\x00\x19\x00\x1C\x00\e\x00\x18\x00\x1A\x00\x16
  00272+ \x00\x0E\x00\r\x00\x0B\x00\x0C\x00\t\x00\n
  00284  \x00\r\x00 \x00\x1E\x06\x01\x06\x02\x06\x03\x05\x01\x05\x02\x05\x03
  00302+ \x04\x01\x04\x02\x04\x03\x03\x01\x03\x02\x03\x03\x02\x01\x02\x02\x02
  00319+ \x033t\x00\x00\x00\x10\x00\x0B\x00\t\x08http/1.1\x00\x15\x00\xAE\x00
  00344+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00361+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00378+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00395+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00412+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00429+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00446+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00463+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00480+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00497+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00514+ \x00\x00\x00

I'll be honest, I have no idea how to make sense of the above!

Update: I managed to capture traffic on port 80 using tcpdump. I downloaded the capture file and opened it with WinShark.

I caught a few requests coming from Facebook IP 31.13.127.5:

10622   15.837038   31.13.127.5 MYSERVERIP TCP  66  47658 → 80 [ACK] Seq=1 Ack=1 Win=61440 Len=0 TSval=1921577847 TSecr=59275252

10701   15.848790   31.13.127.5 MYSERVERIP TCP  583 47658 → 80 [PSH, ACK] Seq=1 Ack=1 Win=61440 Len=517 TSval=1921577859 TSecr=59275252

10702   15.848846   MYSERVERIP  31.13.127.5 HTTP    253 HTTP/1.0 400 Bad request  (text/html)

10914   15.927603   31.13.127.5 MYSERVERIP  TCP 66  47658 → 80 [FIN, ACK] Seq=518 Ack=189 Win=63488 Len=0 TSval=1921577937 TSecr=59275274

10915   15.927611   MYSERVERIP  31.13.127.5 TCP 66  80 → 47658 [ACK] Seq=189 Ack=519 Win=30080 Len=0 TSval=59275294 TSecr=1921577937

12044   17.419319   31.13.127.5 MYSERVERIP  TCP 74  53712 → 80 [SYN] Seq=0 Win=61320 Len=0 MSS=1460 SACK_PERM=1 TSval=1921579431 TSecr=0 WS=2048

12045   17.419337   MYSERVERIP  31.13.127.5 TCP 74  80 → 53712 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=59275667 TSecr=1921579431 WS=128

12125   17.493182   31.13.127.5 MYSERVERIP  TCP 66  53712 → 80 [ACK] Seq=1 Ack=1 Win=61440 Len=0 TSval=1921579505 TSecr=59275667

12126   17.501269   31.13.127.5 MYSERVERIP  TCP 583 53712 → 80 [PSH, ACK] Seq=1 Ack=1 Win=61440 Len=517 TSval=1921579513 TSecr=59275667

12127   17.501387   MYSERVERIP  31.13.127.5 HTTP    253 HTTP/1.0 400 Bad request  (text/html)

12179   17.576974   31.13.127.5 MYSERVERIP  TCP 66  53712 → 80 [FIN, ACK] Seq=518 Ack=189 Win=63488 Len=0 TSval=1921579589 TSecr=59275687
haproxy
asked on Server Fault Oct 1, 2018 by Wayne • edited Oct 29, 2018 by Wayne

1 Answer

3

What you see in that log is a TLS Client Hello message.

That is the initial message sent by the client to initiate the negotiation, consequently there is nothing encrypted in that message. You'll notice there are two fields in that message which contains text. Those are the SNI (server name indication) and the ALPN (application layer protocol negotiation) next protocol fields. The rest of the message is binary data and thus not as easy to read.

At that early stage of TLS the application layer protocol is not yet negotiated and session keys have not been established. The client hasn't even received a certificate that it could validate. That means there is no way the client can have sent any HTTP request yet, and with no HTTP request there isn't anything to send a status code in response to.

The log entry surely sounds like HAProxy thinks it is responding to an HTTP request, even though none has been sent.

From that it sounds like what is happening here is that the server is speaking HTTP and the client is speaking HTTPS. The TLS Client Hello message is thus being misinterpreted as an HTTP request and rejected as invalid.

It would be useful for you to capture the traffic such that it can be inspected to find out what is actually being sent on the wire. If I am right about the above you should see a TLS Client Hello that can be decoded using Wireshark (or similar) and an unencrypted HTTP response with the 400 error code.

What's also going to be interesting about that packet capture will be the port number.

One way what you see could have happened is if a user has entered a URL with protocol and port number such as https://example.com:80/ and facebook is trying to retrieve that URL repeatedly (as it keeps failing).

I tried to put such a malformed URL into facebook myself, and sure enough facebook sent a Client Hello message to port 80. My (Apache) web server responded with a 400 status code as expected.

Your packet capture confirms that the traffic in your case is also happening on port 80. So presumably somebody have given facebook a https URL pointing to your site and overriding the correct port number with 80 instead.

Your server is correctly responding with a 400 code and facebook realize that fetching the URL has failed. There isn't anything to fix. Your server is working as intended, and the only problem was a user error when entering the URL into facebook.

It is possible to find out a bit more about the faulty URL by having your server auto detect the protocol in order to support both HTTP and HTTPS on the same port. However I wouldn't recommend such hacks, and I don't know if any web server software supports it.

answered on Server Fault Oct 28, 2018 by kasperd • edited Oct 29, 2018 by kasperd

User contributions licensed under CC BY-SA 3.0