Using and troubleshooting etcd in kubernetes
etcd, https://coreos.com/etcd/, is a distributed key/value store and contains all details about a kubernetes cluster, such as resources and their states.
How etcd is installed
I install kubernetes, and etcd along with it, using kubespray https://github.com/kubernetes-incubator/kubespray
Interacting with etcd
etcd runs as a container. The startup script used by systemctl is /usr/local/bin/etcd, which has the contents below
#!/bin/bash /usr/bin/docker run \ --restart=on-failure:5 \ --env-file=/etc/etcd.env \ --net=host \ -v /etc/ssl/certs:/etc/ssl/certs:ro \ -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro \ -v /var/lib/etcd:/var/lib/etcd:rw \ --memory=512M \ --blkio-weight=1000 \ --name=etcd1 \ quay.io/coreos/etcd:v3.2.18 \ /usr/local/bin/etcd \ "$@"
Interacting with the running etcd process is done through the docker interface. On a healthy node, one can see the container running etcd as follows
[centos@k8s-node-0 ~]$ sudo docker ps|grep etcd CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 936ad3f72b68 quay.io/coreos/etcd:v3.2.18 "/usr/local/bin/etcd" 6 days ago Up 6 days etcd3
When the etcd process is unhealthy, it may be necessary to add the -a
option to the Docker call to include exited containers.
[centos@k8s-master-0 ~]$ sudo docker ps -a|grep etcd CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0589d69d5892 quay.io/coreos/etcd:v3.2.18 "/usr/local/bin/etcd" 16 seconds ago Exited (137) 10 seconds ago etcd1
In order to see the container logs, you have to be quick and call docker logs on the exited container
[centos@k8s-master-0 ~]$ sudo docker logs 0589d69d5892 2018-11-01 12:09:54.717392 I | pkg/flags: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=https://10.0.40.11:2379 ... 2018-11-01 12:09:54.718334 I | pkg/flags: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem 2018-11-01 12:09:54.718470 I | etcdmain: etcd Version: 3.2.18 2018-11-01 12:09:54.718500 I | etcdmain: Git SHA: eddf599c6 2018-11-01 12:09:54.718516 I | etcdmain: Go Version: go1.8.7 ... 2018-11-01 12:09:59.131586 I | rafthttp: established a TCP streaming connection with peer d30849eeaaa1ae4e (stream Message reader) 2018-11-01 12:09:59.133832 I | rafthttp: established a TCP streaming connection with peer d30849eeaaa1ae4e (stream MsgApp v2 reader)
It’s also possible to get more details about the docker container using docker inspect
[centos@k8s-master-0 ~]$ sudo docker inspect ee9c45650aa4 [ { "Id": "ee9c45650aa43b1223bae726adb0752a6df2a9f19e78837f2e8bcb787eef6a73", "Created": "2018-11-01T12:13:57.285721355Z", "Path": "/usr/local/bin/etcd", "Args": [], "State": { "Status": "exited", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": true, "Dead": false, "Pid": 0, "ExitCode": 137, "Error": "", "StartedAt": "2018-11-01T12:14:00.279517793Z", "FinishedAt": "2018-11-01T12:14:02.154188978Z" }, "Image": "sha256:e21fb69683f3f754d40128ba1981244c3679fe92e5f692ee67a2b94f65564fb0", ... "HostConfig": { "Binds": [ "/var/lib/etcd:/var/lib/etcd:rw", "/etc/ssl/certs:/etc/ssl/certs:ro", "/etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro" ], "Memory": 1073741824, "KernelMemory": 0, "MemoryReservation": 0, "MemorySwap": 2147483648, "MemorySwappiness": -1, ... }, "GraphDriver": { "Name": "overlay", "Data": { "LowerDir": "/var/lib/docker/overlay/9cb650dec14206e33078dda3e4ada08f311809d21ecee6d4cfb53b65dbbd2297/root", "MergedDir": "/var/lib/docker/overlay/3a003be9e7fe8567644a084f1eb3306f05128ac63c544bf3dda5dc9f2a94613f/merged", "UpperDir": "/var/lib/docker/overlay/3a003be9e7fe8567644a084f1eb3306f05128ac63c544bf3dda5dc9f2a94613f/upper", "WorkDir": "/var/lib/docker/overlay/3a003be9e7fe8567644a084f1eb3306f05128ac63c544bf3dda5dc9f2a94613f/work" } }, "Mounts": [ { "Type": "bind", "Source": "/var/lib/etcd", "Destination": "/var/lib/etcd", "Mode": "rw", "RW": true, "Propagation": "rprivate" }, ... ], "Config": { "Hostname": "k8s-master-0", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": true, "AttachStderr": true, "ExposedPorts": { "2379/tcp": {}, "2380/tcp": {} }, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "ETCD_DATA_DIR=/var/lib/etcd", ... ], "Cmd": [ "/usr/local/bin/etcd" ], "Image": "quay.io/coreos/etcd:v3.2.18", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "OnBuild": null, "Labels": {} }, "NetworkSettings": { ... } } ]
Using etcdctl
etcdctl is the CLI to interface with etcd. In order to use etcdctl
, some environment variables must be set. For API V2, this is
export ETCDCTL_CERT_FILE=/etc/ssl/etcd/ssl/member-k8s-master-0.pem export ETCDCTL_KEY_FILE=/etc/ssl/etcd/ssl/member-k8s-master-0-key.pem export ETCDCTL_CA_FILE=/etc/ssl/etcd/ssl/ca.pem export ETCDCTL_ENDPOINT=https://10.0.40.11:2379
For API V3 it looks like this
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem export ETCDCTL_CERT=/etc/ssl/etcd/ssl/member-k8s-master-1.pem export ETCDCTL_KEY=/etc/ssl/etcd/ssl/member-k8s-master-1-key.pem export ETCDCTL_API=3
Note that the ETCDCTL_ENDPOINT
value doesn’t necessarily need to be on the same host where you use etcdctl. The client can then be used as follows. For API V3, the endpoint must be specified as a command line argument.
etcdctl cluster-health member 7185bcfb262c9d54 is healthy: got healthy result from https://10.0.40.11:2379 failed to check the health of member c3764310de29ca27 on https://10.0.40.10:2379: Get https://10.0.40.10:2379/health: read tcp 10.0.40.11:60848->10.0.40.10:2379: read: connection reset by peer member c3764310de29ca27 is unreachable: [https://10.0.40.10:2379] are all unreachable member d30849eeaaa1ae4e is healthy: got healthy result from https://10.0.40.8:2379 cluster is healthy
Notice from the above output that an etcd cluster can be healthy, even when members of that cluster are not healthy.
For API V3
[root@k8s-master-1 ~]# /usr/local/bin/etcdctl endpoint health --endpoints=[https://10.0.40.10:2379,https://10.0.40.11:2379] https://10.0.40.11:2379 is healthy: successfully committed proposal: took = 2.897768ms https://10.0.40.10:2379 is healthy: successfully committed proposal: took = 2.308368ms
Memory issues
The systemctl script above indicates a memory limit with --memory=512M
, which means that Docker will kill the process if it goes above that threshold. It is possible to see how much memory the current Docker container is actually consuming. First you want to get the Linux system PID for the etcd container. Next you want to use pmap to calculate memory used.
[centos@k8s-master-0 ~]$ ps -aux|grep etcd root 29955 0.0 0.0 113128 1188 ? Ss 12:18 0:00 /bin/bash /usr/local/bin/etcd root 29957 0.0 0.2 144704 10360 ? Sl 12:18 0:00 /usr/bin/docker run --restart=on-failure:5 --env-file=/etc/etcd.env --net=host -v /etc/ssl/certs:/etc/ssl/certs:ro -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro -v /var/lib/etcd:/var/lib/etcd:rw --memory=1024M --blkio-weight=1000 --name=etcd1 quay.io/coreos/etcd:v3.2.18 /usr/local/bin/etcd root 29981 5.2 4.1 11214816 162500 ? Ssl 12:18 0:59 /usr/local/bin/etcd centos 32690 0.0 0.0 112664 972 pts/0 R+ 12:37 0:00 grep --color=auto etcd [centos@k8s-master-0 ~]$ sudo pmap -x 29957 29957: /usr/bin/docker run --restart=on-failure:5 --env-file=/etc/etcd.env --net=host -v /etc/ssl/certs:/etc/ssl/certs:ro -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro -v /var/lib/etcd:/var/lib/etcd:rw --memory=1024M --blkio-weight=1000 --name=etcd1 quay.io/coreos/etcd:v3.2.18 /usr/local/bin/etcd Address Kbytes RSS Dirty Mode Mapping 0000000000400000 11372 5280 0 r-x-- docker 000000000111b000 4 4 4 r---- docker 000000000111c000 272 168 136 rw--- docker 0000000001160000 152 68 68 rw--- [ anon ] 0000000003048000 132 4 4 rw--- [ anon ] 000000c000000000 8 8 8 rw--- [ anon ] 000000c41ffd8000 4256 4240 4240 rw--- [ anon ] 000000c420400000 1024 260 260 rw--- [ anon ] 00007f88b8000000 132 4 4 rw--- [ anon ] 00007f88b8021000 65404 0 0 ----- [ anon ] 00007f88bca65000 4 0 0 ----- [ anon ] 00007f88bca66000 8192 8 8 rw--- [ anon ] 00007f88bd266000 4 0 0 ----- [ anon ] 00007f88bd267000 9600 12 12 rw--- [ anon ] 00007f88bdbc7000 4 0 0 ----- [ anon ] 00007f88bdbc8000 8192 8 8 rw--- [ anon ] 00007f88be3c8000 4 0 0 ----- [ anon ] 00007f88be3c9000 8192 8 8 rw--- [ anon ] 00007f88bebc9000 4 0 0 ----- [ anon ] 00007f88bebca000 8192 8 8 rw--- [ anon ] 00007f88bf3ca000 4 0 0 ----- [ anon ] 00007f88bf3cb000 8192 8 8 rw--- [ anon ] 00007f88bfbcb000 8 8 0 r-x-- libdl-2.17.so 00007f88bfbcd000 2048 0 0 ----- libdl-2.17.so 00007f88bfdcd000 4 4 4 r---- libdl-2.17.so 00007f88bfdce000 4 4 4 rw--- libdl-2.17.so 00007f88bfdcf000 1760 244 0 r-x-- libc-2.17.so 00007f88bff87000 2048 0 0 ----- libc-2.17.so 00007f88c0187000 16 16 16 r---- libc-2.17.so 00007f88c018b000 8 8 8 rw--- libc-2.17.so 00007f88c018d000 20 12 12 rw--- [ anon ] 00007f88c0192000 36 12 0 r-x-- libltdl.so.7.3.0 00007f88c019b000 2044 0 0 ----- libltdl.so.7.3.0 00007f88c039a000 4 4 4 r---- libltdl.so.7.3.0 00007f88c039b000 4 4 4 rw--- libltdl.so.7.3.0 00007f88c039c000 92 56 0 r-x-- libpthread-2.17.so 00007f88c03b3000 2044 0 0 ----- libpthread-2.17.so 00007f88c05b2000 4 4 4 r---- libpthread-2.17.so 00007f88c05b3000 4 4 4 rw--- libpthread-2.17.so 00007f88c05b4000 16 4 4 rw--- [ anon ] 00007f88c05b8000 132 112 0 r-x-- ld-2.17.so 00007f88c06ed000 912 116 116 rw--- [ anon ] 00007f88c07d8000 4 4 4 rw--- [ anon ] 00007f88c07d9000 4 4 4 r---- ld-2.17.so 00007f88c07da000 4 4 4 rw--- ld-2.17.so 00007f88c07db000 4 4 4 rw--- [ anon ] 00007ffc06572000 132 20 20 rw--- [ stack ] 00007ffc065e9000 8 4 0 r-x-- [ anon ] ffffffffff600000 4 0 0 r-x-- [ anon ] ---------------- ------- ------- ------- total kB 144708 10740 4992
Keep in mind that memory use may spike when the process starts and then come down afterward, so the threshold needs to be set to accommodate the highest point. It is also possible to remove the memory limit by setting --memory=0
or removing that directive in the systemctl script.
Tools
https://github.com/jpbetz/auger