Friday, March 30, 2018

Kubernetes: Setup Flannel High Availability

ha2-kubernetes

kubernetes HA Config Setting

Flannel

[Unit]
Description=Flanneld
Documentation=https://github.com/coreos/flannel
After=network.target
Before=docker.service

[Service]
User=root
ExecStart=/opt/bin/flanneld \
  --etcd-endpoints="http://192.168.51.140:2379,http://192.168.51.145:2379" \
  --iface=192.168.51.149 \
  --ip-masq
Restart=on-failure
Type=notify
LimitNOFILE=65536

here we set

 --etcd-endpoints="http://192.168.51.140:2379,http://192.168.51.145:2379"

with multiple endpoint

It works

Kubelet

[Unit]
Description=Kubernetes Kubelet
After=docker.service
Requires=docker.service

[Service]
ExecStart=/opt/bin/kubelet \
  --hostname-override=k8stargetnode \
  --api-servers="http://192.168.51.139:8080, http://192.168.51.145:8080" \
  --register-node=true \
  --logtostderr=false \
  --log-dir="/var/log/kubernetes" \
  --v=6 \
  --cluster_dns=172.18.0.5 \
  --cluster_domain=cluster.local
Restart=on-failure
KillMode=process

[Install]
WantedBy=multi-user.target
--api-servers="http://192.168.51.139:8080, http://192.168.51.145:8080" \

With error message

E0929 09:44:45.084951   58581 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get http://192.168.51.139:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dk8stargetnode&resourceVersion=0: dial tcp 192.168.51.139:8080: getsockopt: no route to host

It Failed, it always get first element.

Kube-proxy

With Error Message

E0929 09:40:44.597567   58456 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get http://192.168.51.139:8080,http/192.168.51.145:8080/api/v1/endpoints?resourceVersion=0: dial tcp: unknown port tcp/8080,http

It Failed

Result

An loadbalancer service is need for build k8s(kube-proxy, kubelet) for minion, but flannel can insert multiple address without needing loadbalancer.

Kubernetes: Change Flannel TTL

flannelTTL

Change Flannel TTL

etcdctl set -ttl 0 /coreos.com/network/subnets/10.5.1.0-24 $(etcdctl get /coreos.com/network/subnets/10.5.1.0-24)

Kubernetes Rolling Upgrade

kubernetesRollingupgrade

Kubernetes Rolling Upgrade (Rollout&Rollback)

Thanks for these blogs

https://tachingchen.com/tw/blog/Kubernetes-Rolling-Update-with-Deployment/ https://www.linux.com/learn/rolling-updates-and-rollbacks-using-kubernetes-deployments

Yaml file

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginxstor
spec:
  #serviceName: "nginxstor"
  replicas: 2
  template:
    metadata:
      labels:
        app: nginxstor
      annotations:
        pod.alpha.kubernetes.io/initialized: "true"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - nginxstor
              topologyKey: kubernetes.io/hostname
      containers:
      - name: nginxstor
        image: 192.168.51.130:5000/uwebserverv6
        ports:
          - containerPort: 8000
  minReadySeconds: 5
  strategy:
    # indicate which strategy we want for rolling update
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1

Adding rolling strategy strategy for rolling upgrade.

  minReadySeconds: 5
  strategy:
    # indicate which strategy we want for rolling update
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1

Sleep minReadySecond after container's daemon running and it then start another container.

Rollout (upgrade)

kubectl set image deployment nginxstor nginxstor=192.168.51.130:5000/uwebserverv7 --record

It will start rolling upgrade now

There are several ways to upgrade container, but I prefer this way since with less dependency.

--record is notify kubernetes to record the history, and you will see it later.

To check the status

kubectl rollout status deployment nginxstor

Rollback (downgrade)

To check the version

kubectl rollout history deployment nginxstor

deployments "nginxstor"
REVISION    CHANGE-CAUSE
1       <none>
2       kubectl set image deployment nginxstor nginxstor=192.168.51.130:5000/uwebserverv7 --record=true

Let's go back to revision 1

kubectl rollout undo deployment nginxstor --to-revision=1

Try it, it works amazingly.

Kubernetes Readiness

readiness

Readiness

Method

If readiness is not ready, it will remove endpoint of pod, so the network is not reachable even it register to DNS. if readiness is ready, it will enable the endpoint of pod, so the network is reachable.

Readiness Setting

       image: 192.168.51.130:5000/uwebserverv6


        readinessProbe:
          tcpSocket:
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 5

where initialDelaySeonds is delay time to probe the readiness, and after initial Delay, it will start to delete port 8000.

NAME                          READY     STATUS    RESTARTS   AGE
dep-c-2945203518-dcjjm        0/1       Running   0          1m

To see endpoint, not yet setting

root@kubecontext:~/k8sdeployex/dependency/test2# kubectl get ep
NAME          ENDPOINTS                             AGE
dep-c                                               1m

Although you see the dep-c's status is running, but in actually the networking is not reachable.

Do not use nslookup since the name has been register to Sky-DNS. So if you just use nslookup dep-c, it will be connected.

After Readiness is ready.

root@kubecontext:~/k8sdeployex/dependency/test2# kubectl get po
NAME                          READY     STATUS    RESTARTS   AGE
dep-c-3125362030-4tnjk        1/1       Running   0          1m
root@kubecontext:~/k8sdeployex/dependency/test2# kubectl get ep
NAME          ENDPOINTS                             AGE
dep-c         192.168.15.6:8000                     1m

Mention that

  1. initialDelaySeconds is delay time for probe. We set 60 is for after container launched 60secs, we then start to probe. Why is 60 secs, since we assume that the container will load data and is about 60 secs.
  2. Do not use nslookup, since the DNS has registry the IP root@kubecontext:~/k8sdeployex/dependency# kubectl get svc NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE dep-c 172.18.254.42 <none> 8000/TCP 1m So you will see nslookup detect is true, but the readiness is not ready.

initContainer Setting

      initContainers:
      - name: init-myservice
        image: 192.168.51.130:5000/emotibot-k8s/busybox:latest
        command: ['sh', '-c', 'until nc -v -z -w 1 dep-c 8000 ; do echo waiting for myservice; sleep 2; done;']
        #command: ['sh', '-c', 'until nslookup dep-c ; do echo waiting for myservice; sleep 2; done;']

Do not use nslookup, since DNS is registry. use nc to detect domain name and port is better.

Kubernetes Credential

credential

K8S Credential

[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
After=etcd.service

[Service]
User=root
ExecStart=/opt/bin/kube-apiserver \
 --insecure-bind-address=0.0.0.0 \
 --insecure-port=8080 \
 --etcd-servers=http://192.168.51.131:2379\
 --logtostderr=false \
 --allow-privileged=true \
 --service-cluster-ip-range=172.18.0.0/16 \
 --admission-control=NamespaceLifecycle,ServiceAccount,LimitRanger,SecurityContextDeny,ResourceQuota \
 --service-node-port-range=30000-32767 \
 --advertise-address=192.168.51.131 \
 --v=6 \
 --storage-backend="etcd2" \
 --log-dir="/var/log/kubernetes" \
 --client-ca-file=/srv/kubernetes/ca.crt \
 --tls-private-key-file=/srv/kubernetes/server.key \
 --tls-cert-file=/srv/kubernetes/server.cert \
 --service_account_key_file=/srv/kubernetes/server.key \
 --runtime-config=batch/v2alpha1=true \
 --apiserver-count=2 \
 --authorization-mode=Node,RBAC \
 --secure-port=6443 \
 --token-auth-file=/etc/kubernetes/pki/tokens.csv \
 --basic-auth-file=/etc/kubernetes/basic_auth

Restart=on-failure
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

adding four lines above

 --authorization-mode=Node,RBAC \
 --secure-port=6443 \
 --token-auth-file=/etc/kubernetes/pki/tokens.csv \
 --basic-auth-file=/etc/kubernetes/basic_auth

where

/etc/kubernetes/pki/tokens.csv

792c62a1b5f2b07b,admin,ab47c6cb-f403-11e6-95a3-0800279704c8,system:kubelet-bootstrap

and

/etc/kubernetes/basic_auth

1234,admin,1

Binding the role and namespace

kubectl create rolebinding bob-admin-binding --clusterrole=admin --user=admin --namespace=ebot

Once we bind admin user to ebot, it cannot use in default, shown as followed

root@kubecontext:~# kubectl --token=792c62a1b5f2b07b --server=https://192.168.51.131:6443   get po --namespace=default
Error from server (Forbidden): pods is forbidden: User "admin" cannot list pods in the namespace "default"

it can use in ebot namespace, you can use token as followed,

kubectl --token=792c62a1b5f2b07b --server=https://192.168.51.131:6443  get po --namespace=ebot

Use Username Password as followed.

kubectl --username=admin --password=1234 --server=https://192.168.51.131:6443 get pod  --namespace=ebot

One can remove --namespace setting, you then can access all namespaces.

kubectl create rolebinding bob-admin-binding --clusterrole=admin --user=admin 

Set Up Kube Config

kubectl config set-cluster seccluster --server=https://192.168.51.131:6443 --insecure-skip-tls-verify=true
kubectl config set-credentials dev-user1 --username=admin --password=1234

or just use token is fine

kubectl config set-credentials dev-user1 --token=792c62a1b5f2b07b
kubectl config set-context secctx2 --cluster=seccluster --user=dev-user1 --namespace=kube-system
kubectl config use-context  secctx2

You will see the result

root@kuberm:~# kubectl get po
NAME                       READY     STATUS    RESTARTS   AGE
kube-dns-846480609-v3sn1   3/3       Running   18         54d

However, You need to add the privelege to admin role with namespace kube-system

kubectl create rolebinding bob-admin-binding --clusterrole=admin --user=admin --namespace=kube-system

K8S Flannel Latency Measured by QPERF

qperf

Qperf

How to Get it

apt-get install -y make gcc libc-dev
wget https://www.openfabrics.org/downloads/qperf/qperf-0.4.9.tar.gz
tar zxvf qperf-0.4.9.tar.gz
cd qperf-0.4.9/
./configure
make

Where the binary is in

src/qperf

How to use it

Qperf Server

./qperf

Qperf Client

root@mariadbcluster-1:/# ./qperf -v mariadbcluster-0.mariadbcluster tcp_bw tcp_lat
tcp_bw:
    bw              =  91.5 MB/sec
    msg_rate        =   1.4 K/sec
    send_cost       =  32.6 sec/GB
    recv_cost       =  49.6 sec/GB
    send_cpus_used  =   301 % cpus
    recv_cpus_used  =   454 % cpus
tcp_lat:
    latency        =  63.8 us
    msg_rate       =  15.7 K/sec
    loc_cpus_used  =   294 % cpus
    rem_cpus_used  =   546 % cpus

Where the latency is measured in Flannel network.

./qperf -v msg_size:1:64K:*2 mariadbcluster-0.mariadbcluster tcp_bw tcp_lat

More Command Line

Changing message size

root@mariadbcluster-1:/# ./qperf -v -oo msg_size:1:64K:*2  mariadbcluster-0.mariadbcluster tcp_lat
tcp_lat:
    latency        =  72.4 us
    msg_rate       =  13.8 K/sec
    msg_size       =     1 bytes
    loc_cpus_used  =   286 % cpus
    rem_cpus_used  =   422 % cpus
tcp_lat:
    latency        =  77.1 us
    msg_rate       =    13 K/sec
    msg_size       =     2 bytes
    loc_cpus_used  =   297 % cpus
    rem_cpus_used  =   442 % cpus
tcp_lat:
    latency        =  102 us
    msg_rate       =  9.8 K/sec
    msg_size       =    4 bytes
    loc_cpus_used  =  297 % cpus
    rem_cpus_used  =  423 % cpus
tcp_lat:
    latency        =  78.4 us
    msg_rate       =  12.8 K/sec
    msg_size       =     8 bytes
    loc_cpus_used  =   255 % cpus
    rem_cpus_used  =   458 % cpus
tcp_lat:
    latency        =  63.8 us
    msg_rate       =  15.7 K/sec
    msg_size       =    16 bytes
    loc_cpus_used  =   311 % cpus
    rem_cpus_used  =   516 % cpus
tcp_lat:
    latency        =  83.6 us
    msg_rate       =    12 K/sec
    msg_size       =    32 bytes
    loc_cpus_used  =   304 % cpus
    rem_cpus_used  =   520 % cpus
tcp_lat:
    latency        =  80.2 us
    msg_rate       =  12.5 K/sec
    msg_size       =    64 bytes
    loc_cpus_used  =   324 % cpus
    rem_cpus_used  =   414 % cpus
tcp_lat:
    latency        =  61.5 us
    msg_rate       =  16.3 K/sec
    msg_size       =   128 bytes
    loc_cpus_used  =   318 % cpus
    rem_cpus_used  =   329 % cpus
tcp_lat:
    latency        =  76.3 us
    msg_rate       =  13.1 K/sec
    msg_size       =   256 bytes
    loc_cpus_used  =   278 % cpus
    rem_cpus_used  =   358 % cpus
tcp_lat:
    latency        =  83.1 us
    msg_rate       =    12 K/sec
    msg_size       =   512 bytes
    loc_cpus_used  =   268 % cpus
    rem_cpus_used  =   446 % cpus
tcp_lat:
    latency        =    99 us
    msg_rate       =  10.1 K/sec
    msg_size       =     1 KiB (1,024)
    loc_cpus_used  =   270 % cpus
    rem_cpus_used  =   407 % cpus
tcp_lat:
    latency        =  121 us
    msg_rate       =  8.3 K/sec
    msg_size       =    2 KiB (2,048)
    loc_cpus_used  =  320 % cpus
    rem_cpus_used  =  364 % cpus
tcp_lat:
    latency        =   153 us
    msg_rate       =  6.55 K/sec
    msg_size       =     4 KiB (4,096)
    loc_cpus_used  =   260 % cpus
    rem_cpus_used  =   544 % cpus
tcp_lat:
    latency        =   206 us
    msg_rate       =  4.87 K/sec
    msg_size       =     8 KiB (8,192)
    loc_cpus_used  =   359 % cpus
    rem_cpus_used  =   447 % cpus
tcp_lat:
    latency        =  345 us
    msg_rate       =  2.9 K/sec
    msg_size       =   16 KiB (16,384)
    loc_cpus_used  =  289 % cpus
    rem_cpus_used  =  540 % cpus
tcp_lat:
    latency        =   507 us
    msg_rate       =  1.97 K/sec
    msg_size       =    32 KiB (32,768)
    loc_cpus_used  =   278 % cpus
    rem_cpus_used  =   476 % cpus
tcp_lat:
    latency        =  1.32 ms
    msg_rate       =   755 /sec
    msg_size       =    64 KiB (65,536)
    loc_cpus_used  =   312 % cpus
    rem_cpus_used  =   666 % cpus


Fix certain message size

./qperf -oo msg_size:1K  mariadbcluster-0.mariadbcluster tcp_lat