I have been getting a sizeable number of BADREQ entries in my HAProxy log (10+ per minute).
Example:
Oct 1 19:46:00 LB haproxy[19022]: 69.171.251.8:57356 [01/Oct/2018:19:46:00.903] sitename sitename/ -1/-1/-1/-1/5 400 187 - - PRNN 19/19/0/0/5 0/0 ""
Almost all of them seem to be coming from Facebook's crawler.
The crawler seems to be scraping content just fine for the most part. However, a small number of these requests are resulting in BADREQ errors.
As suggested elsewhere, I used socat to see the last error by running the following command:
sudo echo "show errors" | sudo socat unix-connect:/var/run/haproxy.stat stdio
This gave me the following output:
invalid request
backend mysite (#2), server <NONE> (#-1), event #127
src 69.171.251.1:61042, session #9717, session flags 0x00000080
HTTP msg state 26, msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00808002, out 0 bytes, total 517 bytes
pending 517 bytes, wrapping at 32776, error at position 0:
00000 \x16\x03\x01\x02\x00\x01\x00\x01\xFC\x03\x03 B\x9B\xF8\xAE\xFB=\xD7dN
00021+ \x8D\xAD\xCCP\x99\x9C\xEEow#w\n
00033 \xB5\x99\x16g@\x1F{\x9A5H\x00\x00\xAA\xC00\xC0,\xC0(\xC0$\xC0\x14\xC0
00057+ \n
00058 \x00\xA5\x00\xA3\x00\xA1\x00\x9F\x00k\x00j\x00i\x00h\x009\x008\x007
00080+ \x006\xCC\xA9\xCC\xA8\xCC\x14\xCC\x13\xCC\xAA\xCC\x15\x00\x88\x00\x87
00098+ \x00\x86\x00\x85\xC02\xC0.\xC0*\xC0&\xC0\x0F\xC0\x05\x00\x9D\x00=\x005
00120+ \x00\x84\xC0/\xC0+\xC0'\xC0#\xC0\x13\xC0\t\x00\xA4\x00\xA2\x00\xA0\x00
00141+ \x9E\x00g\x00@\x00?\x00>\x003\x002\x001\x000\x00\x9A\x00\x99\x00\x98
00164+ \x00\x97\x00E\x00D\x00C\x00B\xC01\xC0-\xC0)\xC0%\xC0\x0E\xC0\x04\x00
00187+ \x9C\x00<\x00/\x00\x96\x00A\xC0\x12\xC0\x08\x00\x16\x00\x13\x00\x10
00206+ \x00\r\xC0\r\xC0\x03\x00\n
00214 \x00\xFF\x01\x00\x01)\x00\x00\x00\x14\x00\x12\x00\x00\x0Ffb.mysite.c
00242+ om\x00\x0B\x00\x04\x03\x00\x01\x02\x00\n
00254 \x00\x1C\x00\x1A\x00\x17\x00\x19\x00\x1C\x00\e\x00\x18\x00\x1A\x00\x16
00272+ \x00\x0E\x00\r\x00\x0B\x00\x0C\x00\t\x00\n
00284 \x00\r\x00 \x00\x1E\x06\x01\x06\x02\x06\x03\x05\x01\x05\x02\x05\x03
00302+ \x04\x01\x04\x02\x04\x03\x03\x01\x03\x02\x03\x03\x02\x01\x02\x02\x02
00319+ \x033t\x00\x00\x00\x10\x00\x0B\x00\t\x08http/1.1\x00\x15\x00\xAE\x00
00344+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00361+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00378+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00395+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00412+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00429+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00446+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00463+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00480+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00497+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
00514+ \x00\x00\x00
I'll be honest, I have no idea how to make sense of the above!
Update: I managed to capture traffic on port 80 using tcpdump. I downloaded the capture file and opened it with WinShark.
I caught a few requests coming from Facebook IP 31.13.127.5:
10622 15.837038 31.13.127.5 MYSERVERIP TCP 66 47658 → 80 [ACK] Seq=1 Ack=1 Win=61440 Len=0 TSval=1921577847 TSecr=59275252
10701 15.848790 31.13.127.5 MYSERVERIP TCP 583 47658 → 80 [PSH, ACK] Seq=1 Ack=1 Win=61440 Len=517 TSval=1921577859 TSecr=59275252
10702 15.848846 MYSERVERIP 31.13.127.5 HTTP 253 HTTP/1.0 400 Bad request (text/html)
10914 15.927603 31.13.127.5 MYSERVERIP TCP 66 47658 → 80 [FIN, ACK] Seq=518 Ack=189 Win=63488 Len=0 TSval=1921577937 TSecr=59275274
10915 15.927611 MYSERVERIP 31.13.127.5 TCP 66 80 → 47658 [ACK] Seq=189 Ack=519 Win=30080 Len=0 TSval=59275294 TSecr=1921577937
12044 17.419319 31.13.127.5 MYSERVERIP TCP 74 53712 → 80 [SYN] Seq=0 Win=61320 Len=0 MSS=1460 SACK_PERM=1 TSval=1921579431 TSecr=0 WS=2048
12045 17.419337 MYSERVERIP 31.13.127.5 TCP 74 80 → 53712 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=59275667 TSecr=1921579431 WS=128
12125 17.493182 31.13.127.5 MYSERVERIP TCP 66 53712 → 80 [ACK] Seq=1 Ack=1 Win=61440 Len=0 TSval=1921579505 TSecr=59275667
12126 17.501269 31.13.127.5 MYSERVERIP TCP 583 53712 → 80 [PSH, ACK] Seq=1 Ack=1 Win=61440 Len=517 TSval=1921579513 TSecr=59275667
12127 17.501387 MYSERVERIP 31.13.127.5 HTTP 253 HTTP/1.0 400 Bad request (text/html)
12179 17.576974 31.13.127.5 MYSERVERIP TCP 66 53712 → 80 [FIN, ACK] Seq=518 Ack=189 Win=63488 Len=0 TSval=1921579589 TSecr=59275687
What you see in that log is a TLS Client Hello message.
That is the initial message sent by the client to initiate the negotiation, consequently there is nothing encrypted in that message. You'll notice there are two fields in that message which contains text. Those are the SNI (server name indication) and the ALPN (application layer protocol negotiation) next protocol fields. The rest of the message is binary data and thus not as easy to read.
At that early stage of TLS the application layer protocol is not yet negotiated and session keys have not been established. The client hasn't even received a certificate that it could validate. That means there is no way the client can have sent any HTTP request yet, and with no HTTP request there isn't anything to send a status code in response to.
The log entry surely sounds like HAProxy thinks it is responding to an HTTP request, even though none has been sent.
From that it sounds like what is happening here is that the server is speaking HTTP and the client is speaking HTTPS. The TLS Client Hello message is thus being misinterpreted as an HTTP request and rejected as invalid.
It would be useful for you to capture the traffic such that it can be inspected to find out what is actually being sent on the wire. If I am right about the above you should see a TLS Client Hello that can be decoded using Wireshark (or similar) and an unencrypted HTTP response with the 400 error code.
What's also going to be interesting about that packet capture will be the port number.
One way what you see could have happened is if a user has entered a URL with protocol and port number such as https://example.com:80/
and facebook is trying to retrieve that URL repeatedly (as it keeps failing).
I tried to put such a malformed URL into facebook myself, and sure enough facebook sent a Client Hello message to port 80. My (Apache) web server responded with a 400 status code as expected.
Your packet capture confirms that the traffic in your case is also happening on port 80. So presumably somebody have given facebook a https URL pointing to your site and overriding the correct port number with 80 instead.
Your server is correctly responding with a 400 code and facebook realize that fetching the URL has failed. There isn't anything to fix. Your server is working as intended, and the only problem was a user error when entering the URL into facebook.
It is possible to find out a bit more about the faulty URL by having your server auto detect the protocol in order to support both HTTP and HTTPS on the same port. However I wouldn't recommend such hacks, and I don't know if any web server software supports it.
User contributions licensed under CC BY-SA 3.0