How to troubleshoot docker swarm networking issue

swarm

By Vikrant

December 25, 2017

Continuing the docker swarm series, in previous articles, I have discussed about the docker swarm cluster creation and some basic networking of docker swarm. In this article, we will dig more into the docker swarm network troubleshooting. This article is based on the awesome work done by Sreenivas (https://sreeninet.wordpress.com/2017/11/02/docker-networking-tip-troubleshooting/).

To show the network troubleshooting, I am going to start a vote application with replication count of 2 and a client container.

Step 1 : Create overlay network which will be used to start the vote and client application.

docker@manager1:~$ docker network create -d overlay overlay1

Step 2 : Following commands are used for starting the vote and client service. I have used overlay1 network to start the services which I have created in previous step.

docker@manager1:~$ docker service create --replicas 1 --name client --network overlay1 smakam/myubuntu:v4 sleep infinity
mnaihkfak6kqpdvwmy4j95esg

docker@manager1:~$ docker service create --mode replicated --replicas 2 --name vote --network overlay1 --publish mode=ingress,target=80,published=8080 instavote/vote
2z5g8tn1ohnpnd82n41koqtvh

Step 3 : Verify that services are started succesfully. We can use the following commands to verify that on which node services are started.

docker@manager1:~$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                   PORTS
mnaihkfak6kq        client              replicated          1/1                 smakam/myubuntu:v4
2z5g8tn1ohnp        vote                replicated          0/2                 instavote/vote:latest   *:8080->80/tcp

docker@manager1:~$ docker service ps client
ID                  NAME                IMAGE                NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
nnieiubwgj30        client.1            smakam/myubuntu:v4   manager1            Running             Running 14 minutes ago

docker@manager1:~$ docker service ps vote
ID                  NAME                IMAGE                   NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
njrh2ofmap06        vote.1              instavote/vote:latest   worker1             Running             Running 13 minutes ago
ij6fmmcx71po        vote.2              instavote/vote:latest   manager1            Running             Running 13 minutes ago

Step 4 : Let’s check the containers which are running on manager node.

docker@manager1:~$ docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS               NAMES
1c06334e3589        instavote/vote:latest   "gunicorn app:app ..."   5 hours ago         Up 5 hours          80/tcp              vote.2.ij6fmmcx71po5sjeessn9wl0j
d38975856cfd        smakam/myubuntu:v4      "sleep infinity"         5 hours ago         Up 5 hours                              client.1.nnieiubwgj30js73iwzxfleoe

Step 5 : Login into the each container to see the number of network interfaces present inside the container. Two containers (one client and one vote) are running on manager1 node and one container (vote) is running on worker1 node.

Client container have two interfaces, eth0 interface from overlay1 and eth1 interface from docker_gwbridge network. Keep a note of the IP address (10.0.0.2/32) assigned on loopback interface of client container, this is a VIP.

docker@manager1:~$ docker exec -it d38975856cfd sh
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 10.0.0.2/32 scope global lo
       valid_lft forever preferred_lft forever
23: eth0@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:00:00:03 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.3/24 scope global eth0
       valid_lft forever preferred_lft forever
25: eth1@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.3/16 scope global eth1
       valid_lft forever preferred_lft forever

vote application container have three interfaces, eth0 interface with ingress network,eth1 interface from docker_gwbridge and eth2 interface from overlay network. Again make a note of the virtual IPs present on loopback interface. Two VIPs are present 10.255.0.4/32 and 10.0.0.4/32 on lo interface, one is from ingress network and other one from overlay network.

docker@manager1:~$ docker exec -it 1c06334e3589 sh
/app # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 10.255.0.4/32 scope global lo
       valid_lft forever preferred_lft forever
    inet 10.0.0.4/32 scope global lo
       valid_lft forever preferred_lft forever
27: eth0@if28: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:0a:ff:00:06 brd ff:ff:ff:ff:ff:ff
    inet 10.255.0.6/16 scope global eth0
       valid_lft forever preferred_lft forever
29: eth1@if30: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:12:00:04 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.4/16 scope global eth1
       valid_lft forever preferred_lft forever
31: eth2@if32: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:0a:00:00:06 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.6/24 scope global eth2
       valid_lft forever preferred_lft forever
	   
Similarly on worker1 node. 	   
	   
docker@worker1:~$ docker exec -it 3ab467754bf5 sh
/app # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 10.255.0.4/32 scope global lo
       valid_lft forever preferred_lft forever
    inet 10.0.0.4/32 scope global lo
       valid_lft forever preferred_lft forever
18: eth0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:0a:ff:00:05 brd ff:ff:ff:ff:ff:ff
    inet 10.255.0.5/16 scope global eth0
       valid_lft forever preferred_lft forever
20: eth1@if21: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.3/16 scope global eth1
       valid_lft forever preferred_lft forever
23: eth2@if24: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:0a:00:00:05 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.5/24 scope global eth2
       valid_lft forever preferred_lft forever

Question may arise why we have two VIPs assigned for vote service, vote service can be accessed using two methods, either from client or from the host machine. When it’s accessed from the client then myoverlay network is used to access the service. If it’s accessed using host machine then ingress routing mesh network is used to provide the access.

Step 6 : Performing inspect on vote service shows us the VIPs which are used to access the service. These are the same IPs which are assigned on lo interface of vote application.

docker@manager1:~$ docker service inspect vote
	   
            "VirtualIPs": [
                {
                    "NetworkID": "sh0h7as3prit3pd8nhqdbv6x3",
                    "Addr": "10.255.0.4/16"
                },
                {
                    "NetworkID": "uhtur2cqnefffagaih0q5hpbp",
                    "Addr": "10.0.0.4/24"
                }
            ]

Step 7 : Logged into the client container and tried to access the vote applications, if you see in the curl output, traffic is getting load balanced to two containers. Note : we haven’t used the IPaddress to access the containers as swarm automatically provides the DNS discovery.

docker@manager1:~$ docker exec -it d38975856cfd sh	   

# curl vote | grep "container ID"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3162  100  3162    0     0   151k      0 --:--:-- --:--:-- --:--:--  154k
          Processed by container ID 1c06334e3589
#  curl vote | grep "container ID"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3162  100  3162    0     0   266k      0 --:--:-- --:--:-- --:--:--  280k
          Processed by container ID 3ab467754bf5

Step 8 : While performing curl command, if you run the tcpdump on overlay interface then we can see the back and forth traffic, as network troubleshooting tools are not present inside the default vote application image hence I have attached netshoot container with vote application container running on manager1 node. We can see the tcpdump traffic while performing curl on vote application.

docker@manager1:~$ docker run -it --net container:1c06334e3589 nicolaka/netshoot
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 10.255.0.4/32 scope global lo
       valid_lft forever preferred_lft forever
    inet 10.0.0.4/32 scope global lo
       valid_lft forever preferred_lft forever
27: eth0@if28: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:ff:00:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.255.0.6/16 scope global eth0
       valid_lft forever preferred_lft forever
29: eth1@if30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:12:00:04 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 172.18.0.4/16 scope global eth1
       valid_lft forever preferred_lft forever
31: eth2@if32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:00:00:06 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet 10.0.0.6/24 scope global eth2
       valid_lft forever preferred_lft forever


/ # tcpdump -s0 -i eth2 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 262144 bytes
14:52:47.525076 IP 10.0.0.3.48366 > 10.0.0.5.80: Flags [S], seq 3360028736, win 28200, options [mss 1410,sackOK,TS val 2853665 ecr 0,nop,wscale 7], length 0
14:52:52.112751 IP 10.0.0.3.48368 > 10.0.0.6.80: Flags [S], seq 1598409449, win 28200, options [mss 1410,sackOK,TS val 2854124 ecr 0,nop,wscale 7], length 0
14:52:52.112780 IP 10.0.0.6.80 > 10.0.0.3.48368: Flags [S.], seq 3328134166, ack 1598409450, win 27960, options [mss 1410,sackOK,TS val 2854124 ecr 2854124,nop,wscale 7], length 0
14:52:52.112806 IP 10.0.0.3.48368 > 10.0.0.6.80: Flags [.], ack 1, win 221, options [nop,nop,TS val 2854124 ecr 2854124], length 0
14:52:52.113945 IP 10.0.0.3.48368 > 10.0.0.6.80: Flags [P.], seq 1:69, ack 1, win 221, options [nop,nop,TS val 2854124 ecr 2854124], length 68: HTTP: GET / HTTP/1.1
14:52:52.114077 IP 10.0.0.6.80 > 10.0.0.3.48368: Flags [.], ack 69, win 219, options [nop,nop,TS val 2854124 ecr 2854124], length 0
14:52:52.126706 IP 10.0.0.6.80 > 10.0.0.3.48368: Flags [P.], seq 1:210, ack 69, win 219, options [nop,nop,TS val 2854125 ecr 2854124], length 209: HTTP: HTTP/1.1 200 OK
14:52:52.126766 IP 10.0.0.3.48368 > 10.0.0.6.80: Flags [.], ack 210, win 229, options [nop,nop,TS val 2854125 ecr 2854125], length 0
14:52:52.127107 IP 10.0.0.6.80 > 10.0.0.3.48368: Flags [P.], seq 210:3372, ack 69, win 219, options [nop,nop,TS val 2854125 ecr 2854125], length 3162: HTTP
14:52:52.127133 IP 10.0.0.3.48368 > 10.0.0.6.80: Flags [.], ack 3372, win 279, options [nop,nop,TS val 2854125 ecr 2854125], length 0
14:52:52.130372 IP 10.0.0.3.48368 > 10.0.0.6.80: Flags [F.], seq 69, ack 3372, win 279, options [nop,nop,TS val 2854126 ecr 2854125], length 0
14:52:52.134746 IP 10.0.0.6.80 > 10.0.0.3.48368: Flags [F.], seq 3372, ack 70, win 219, options [nop,nop,TS val 2854126 ecr 2854126], length 0
14:52:52.134799 IP 10.0.0.3.48368 > 10.0.0.6.80: Flags [.], ack 3373, win 279, options [nop,nop,TS val 2854126 ecr 2854126], length 0

Step 9 : Let’s check how the traffic is getting redirected from the client to vote application containers. First get the sandbox ID associated with vote application container.

docker@manager1:~$ docker container inspect d38975856cfd | grep -i sandbox
            "SandboxID": "727bc5c9f5d0099c1c119e8faead35c9eee37159eeb8017bebd31c5e0f815944",
            "SandboxKey": "/var/run/docker/netns/727bc5c9f5d0",

Use this sandbox ID to get the internals for networking routing in docker swarm.

root@manager1:/var/run/docker/netns# ls
1-sh0h7as3pr  1-uhtur2cqne  727bc5c9f5d0  a557f907c3f2  default       ingress_sbox

Starting netshoot container and attaching it with network namespace in privileged mode.

docker@manager1:~$ docker run -it --rm -v /var/run/docker/netns:/var/run/docker/netns --privileged=true nicolaka/netshoot
/ # nsenter --net=/var/run/docker/netns/727bc5c9f5d0 sh

iptables are showing the MARK which it’s sets on any traffic hitting IP address 10.0.0.4 (remember this is one of the VIP assigned to vote service from myoverlay1 network) in this case it’s setting HEX value of 0x103.

/ # iptables -t mangle -nvL
Chain PREROUTING (policy ACCEPT 60 packets, 23741 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain INPUT (policy ACCEPT 60 packets, 23741 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 105 packets, 6489 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.0.0.2             MARK set 0x101
   39  2484 MARK       all  --  *      *       0.0.0.0/0            10.0.0.4             MARK set 0x103

Chain POSTROUTING (policy ACCEPT 66 packets, 4005 bytes)
 pkts bytes target     prot opt in     out     source               destination

Converting hex value of 0x103 to decimal gives 259. Two backend IP addresses present corresponds to each vote container IP.

/ # ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  257 rr
  -> 10.0.0.3:0                   Masq    1      0          0
FWM  259 rr
  -> 10.0.0.5:0                   Masq    1      0          0
  -> 10.0.0.6:0                   Masq    1      0          0

In this article, we have covered how the vote application is accessed using client continer, in next article, we will be covering how the same application is accessed from the host machine.