You’ve raised your file descriptor limits, updated security limits, tweaked your network settings and done everything else in preperation to launch your shiny new dockerized application.
Then you have performance issues and you can’t understand why, it looks to be network related. Alright! Let’s see what’s going on:
Maybe its DNS related…. Let’s try again:
That’s odd, maybe it’s a networking issue outside of our servers. Lets try pinging another host on the subnet:
That’s even more odd, our other host isn’t having network issues at all. Lets try going the other way:
We’re getting a lot of packet loss going from Host B to Host A (the problem machine). Maybe it’s a bad NIC?
Just for fun I decided to try and ping localhost/127.0.0.1:
That’s a new one. What the heck is going on? Now at this point I derped out and didn’t think to check
dmesg. Lets assume you went down the road I went and derped.
What’s the different between host A and B? Well, host B doesn’t have docker installed!
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6
Okay so it happens when docker is installed. We’ve isolated it. Kernel bug maybe? Queue swapping around kernels and still the same issue happens.
Fun side note: Ubuntu 14.04 has a kernel bug that prevents booting into LVM or software raided grub. Launchpad Bug
Switching back to the normal kernel (3.13) that comes with 14.04, we proceed. Docker bug? Hit up
#docker on Freenode. Someone mentions checking dmesg and conntrack information.
dmesg has tons of these:
How does docker networking work? NAT! That mean’s
iptables needs to keep track of all your connections, hence the full message.
If you google the original message you’ll see a lot of people telling you to check your iptables rules and ACCEPT/INPUT chains to make sure there isn’t anything funky in there. If we combine this knowledge + the dmesg errors, we now know what to fix.
sysctl.conf and reboot for good measure ( you could also apply them with
sysctl -p but I wanted to make sure everything was fresh. )
1 2 3
Adjust the conntrack max until you hit a stable count (556k worked well for me) and don’t get anymore connection errors. Start your shiny new docker application that makes tons of network connections and everything should be good now.
Hope this helps someone in the future, as Google really didn’t have a lot of useful information on this message + Docker.