Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Software Engineering

Using and troubleshooting etcd in kubernetes

etcd, https://coreos.com/etcd/, is a distributed key/value store and contains all details about a kubernetes cluster, such as resources and their states.

How etcd is installed

I install kubernetes, and etcd along with it, using kubespray https://github.com/kubernetes-incubator/kubespray

Interacting with etcd

etcd runs as a container. The startup script used by systemctl is /usr/local/bin/etcd, which has the contents below

#!/bin/bash
/usr/bin/docker run \
  --restart=on-failure:5 \
  --env-file=/etc/etcd.env \
  --net=host \
  -v /etc/ssl/certs:/etc/ssl/certs:ro \
  -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro \
  -v /var/lib/etcd:/var/lib/etcd:rw \
  --memory=512M \
  --blkio-weight=1000 \
  --name=etcd1 \
  quay.io/coreos/etcd:v3.2.18 \
  /usr/local/bin/etcd \
  "$@"

Interacting with the running etcd process is done through the docker interface. On a healthy node, one can see the container running etcd as follows

[centos@k8s-node-0 ~]$ sudo docker ps|grep etcd
CONTAINER ID        IMAGE                                 COMMAND                  CREATED             STATUS              PORTS               NAMES
936ad3f72b68        quay.io/coreos/etcd:v3.2.18           "/usr/local/bin/etcd"    6 days ago          Up 6 days                               etcd3

When the etcd process is unhealthy, it may be necessary to add the -a option to the Docker call to include exited containers.

[centos@k8s-master-0 ~]$ sudo docker ps -a|grep etcd
CONTAINER ID        IMAGE                                      COMMAND                  CREATED             STATUS                        PORTS               NAMES
0589d69d5892        quay.io/coreos/etcd:v3.2.18                "/usr/local/bin/etcd"    16 seconds ago      Exited (137) 10 seconds ago                       etcd1

In order to see the container logs, you have to be quick and call docker logs on the exited container

[centos@k8s-master-0 ~]$ sudo docker logs 0589d69d5892
2018-11-01 12:09:54.717392 I | pkg/flags: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=https://10.0.40.11:2379
...
2018-11-01 12:09:54.718334 I | pkg/flags: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
2018-11-01 12:09:54.718470 I | etcdmain: etcd Version: 3.2.18
2018-11-01 12:09:54.718500 I | etcdmain: Git SHA: eddf599c6
2018-11-01 12:09:54.718516 I | etcdmain: Go Version: go1.8.7
...
2018-11-01 12:09:59.131586 I | rafthttp: established a TCP streaming connection with peer d30849eeaaa1ae4e (stream Message reader)
2018-11-01 12:09:59.133832 I | rafthttp: established a TCP streaming connection with peer d30849eeaaa1ae4e (stream MsgApp v2 reader)

It’s also possible to get more details about the docker container using docker inspect

[centos@k8s-master-0 ~]$ sudo docker inspect ee9c45650aa4
[
    {
        "Id": "ee9c45650aa43b1223bae726adb0752a6df2a9f19e78837f2e8bcb787eef6a73",
        "Created": "2018-11-01T12:13:57.285721355Z",
        "Path": "/usr/local/bin/etcd",
        "Args": [],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": true,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2018-11-01T12:14:00.279517793Z",
            "FinishedAt": "2018-11-01T12:14:02.154188978Z"
        },
        "Image": "sha256:e21fb69683f3f754d40128ba1981244c3679fe92e5f692ee67a2b94f65564fb0",
        ...
        "HostConfig": {
            "Binds": [
                "/var/lib/etcd:/var/lib/etcd:rw",
                "/etc/ssl/certs:/etc/ssl/certs:ro",
                "/etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro"
            ],
            "Memory": 1073741824,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 2147483648,
            "MemorySwappiness": -1,
            ...
        },
        "GraphDriver": {
            "Name": "overlay",
            "Data": {
                "LowerDir": "/var/lib/docker/overlay/9cb650dec14206e33078dda3e4ada08f311809d21ecee6d4cfb53b65dbbd2297/root",
                "MergedDir": "/var/lib/docker/overlay/3a003be9e7fe8567644a084f1eb3306f05128ac63c544bf3dda5dc9f2a94613f/merged",
                "UpperDir": "/var/lib/docker/overlay/3a003be9e7fe8567644a084f1eb3306f05128ac63c544bf3dda5dc9f2a94613f/upper",
                "WorkDir": "/var/lib/docker/overlay/3a003be9e7fe8567644a084f1eb3306f05128ac63c544bf3dda5dc9f2a94613f/work"
            }
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/lib/etcd",
                "Destination": "/var/lib/etcd",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            ...
        ],
        "Config": {
            "Hostname": "k8s-master-0",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "ExposedPorts": {
                "2379/tcp": {},
                "2380/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "ETCD_DATA_DIR=/var/lib/etcd",
                ...
            ],
            "Cmd": [
                "/usr/local/bin/etcd"
            ],
            "Image": "quay.io/coreos/etcd:v3.2.18",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {}
        },
        "NetworkSettings": {
            ...
        }
    }
]

Using etcdctl

etcdctl is the CLI to interface with etcd. In order to use etcdctl, some environment variables must be set. For API V2, this is

export ETCDCTL_CERT_FILE=/etc/ssl/etcd/ssl/member-k8s-master-0.pem
export ETCDCTL_KEY_FILE=/etc/ssl/etcd/ssl/member-k8s-master-0-key.pem
export ETCDCTL_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_ENDPOINT=https://10.0.40.11:2379

For API V3 it looks like this

export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/member-k8s-master-1.pem
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/member-k8s-master-1-key.pem
export ETCDCTL_API=3

Note that the ETCDCTL_ENDPOINT value doesn’t necessarily need to be on the same host where you use etcdctl. The client can then be used as follows. For API V3, the endpoint must be specified as a command line argument.

etcdctl cluster-health
member 7185bcfb262c9d54 is healthy: got healthy result from https://10.0.40.11:2379
failed to check the health of member c3764310de29ca27 on https://10.0.40.10:2379: Get https://10.0.40.10:2379/health: read tcp 10.0.40.11:60848->10.0.40.10:2379: read: connection reset by peer
member c3764310de29ca27 is unreachable: [https://10.0.40.10:2379] are all unreachable
member d30849eeaaa1ae4e is healthy: got healthy result from https://10.0.40.8:2379
cluster is healthy

Notice from the above output that an etcd cluster can be healthy, even when members of that cluster are not healthy.

For API V3

[root@k8s-master-1 ~]# /usr/local/bin/etcdctl endpoint health --endpoints=[https://10.0.40.10:2379,https://10.0.40.11:2379]
https://10.0.40.11:2379 is healthy: successfully committed proposal: took = 2.897768ms
https://10.0.40.10:2379 is healthy: successfully committed proposal: took = 2.308368ms

Memory issues

The systemctl script above indicates a memory limit with --memory=512M, which means that Docker will kill the process if it goes above that threshold. It is possible to see how much memory the current Docker container is actually consuming. First you want to get the Linux system PID for the etcd container. Next you want to use pmap to calculate memory used.

[centos@k8s-master-0 ~]$ ps -aux|grep etcd
root     29955  0.0  0.0 113128  1188 ?        Ss   12:18   0:00 /bin/bash /usr/local/bin/etcd
root     29957  0.0  0.2 144704 10360 ?        Sl   12:18   0:00 /usr/bin/docker run --restart=on-failure:5 --env-file=/etc/etcd.env --net=host -v /etc/ssl/certs:/etc/ssl/certs:ro -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro -v /var/lib/etcd:/var/lib/etcd:rw --memory=1024M --blkio-weight=1000 --name=etcd1 quay.io/coreos/etcd:v3.2.18 /usr/local/bin/etcd
root     29981  5.2  4.1 11214816 162500 ?     Ssl  12:18   0:59 /usr/local/bin/etcd
centos   32690  0.0  0.0 112664   972 pts/0    R+   12:37   0:00 grep --color=auto etcd
[centos@k8s-master-0 ~]$ sudo pmap -x 29957
29957:   /usr/bin/docker run --restart=on-failure:5 --env-file=/etc/etcd.env --net=host -v /etc/ssl/certs:/etc/ssl/certs:ro -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro -v /var/lib/etcd:/var/lib/etcd:rw --memory=1024M --blkio-weight=1000 --name=etcd1 quay.io/coreos/etcd:v3.2.18 /usr/local/bin/etcd
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000   11372    5280       0 r-x-- docker
000000000111b000       4       4       4 r---- docker
000000000111c000     272     168     136 rw--- docker
0000000001160000     152      68      68 rw---   [ anon ]
0000000003048000     132       4       4 rw---   [ anon ]
000000c000000000       8       8       8 rw---   [ anon ]
000000c41ffd8000    4256    4240    4240 rw---   [ anon ]
000000c420400000    1024     260     260 rw---   [ anon ]
00007f88b8000000     132       4       4 rw---   [ anon ]
00007f88b8021000   65404       0       0 -----   [ anon ]
00007f88bca65000       4       0       0 -----   [ anon ]
00007f88bca66000    8192       8       8 rw---   [ anon ]
00007f88bd266000       4       0       0 -----   [ anon ]
00007f88bd267000    9600      12      12 rw---   [ anon ]
00007f88bdbc7000       4       0       0 -----   [ anon ]
00007f88bdbc8000    8192       8       8 rw---   [ anon ]
00007f88be3c8000       4       0       0 -----   [ anon ]
00007f88be3c9000    8192       8       8 rw---   [ anon ]
00007f88bebc9000       4       0       0 -----   [ anon ]
00007f88bebca000    8192       8       8 rw---   [ anon ]
00007f88bf3ca000       4       0       0 -----   [ anon ]
00007f88bf3cb000    8192       8       8 rw---   [ anon ]
00007f88bfbcb000       8       8       0 r-x-- libdl-2.17.so
00007f88bfbcd000    2048       0       0 ----- libdl-2.17.so
00007f88bfdcd000       4       4       4 r---- libdl-2.17.so
00007f88bfdce000       4       4       4 rw--- libdl-2.17.so
00007f88bfdcf000    1760     244       0 r-x-- libc-2.17.so
00007f88bff87000    2048       0       0 ----- libc-2.17.so
00007f88c0187000      16      16      16 r---- libc-2.17.so
00007f88c018b000       8       8       8 rw--- libc-2.17.so
00007f88c018d000      20      12      12 rw---   [ anon ]
00007f88c0192000      36      12       0 r-x-- libltdl.so.7.3.0
00007f88c019b000    2044       0       0 ----- libltdl.so.7.3.0
00007f88c039a000       4       4       4 r---- libltdl.so.7.3.0
00007f88c039b000       4       4       4 rw--- libltdl.so.7.3.0
00007f88c039c000      92      56       0 r-x-- libpthread-2.17.so
00007f88c03b3000    2044       0       0 ----- libpthread-2.17.so
00007f88c05b2000       4       4       4 r---- libpthread-2.17.so
00007f88c05b3000       4       4       4 rw--- libpthread-2.17.so
00007f88c05b4000      16       4       4 rw---   [ anon ]
00007f88c05b8000     132     112       0 r-x-- ld-2.17.so
00007f88c06ed000     912     116     116 rw---   [ anon ]
00007f88c07d8000       4       4       4 rw---   [ anon ]
00007f88c07d9000       4       4       4 r---- ld-2.17.so
00007f88c07da000       4       4       4 rw--- ld-2.17.so
00007f88c07db000       4       4       4 rw---   [ anon ]
00007ffc06572000     132      20      20 rw---   [ stack ]
00007ffc065e9000       8       4       0 r-x--   [ anon ]
ffffffffff600000       4       0       0 r-x--   [ anon ]
---------------- ------- ------- -------
total kB          144708   10740    4992

Keep in mind that memory use may spike when the process starts and then come down afterward, so the threshold needs to be set to accommodate the highest point. It is also possible to remove the memory limit by setting --memory=0 or removing that directive in the systemctl script.

Tools

https://github.com/jpbetz/auger

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.