一、场景
节点 notready 后大概 5 分钟才能重新调度 pod,生产环境中高并发的场景下,一个副本 5 分钟无法提供服务,肯定会将请求压力转到其他副本,容易造成堵塞,严重的会阻断服务。如何缩短这个时间?
二、方案
k8s 中有个准入控制器:DefaultTolerationSeconds。此准入控制器基于 k8s-apiserver 的输入参数 default-not-ready-toleration-seconds 和 default-unreachable-toleration-seconds 为 Pod 设置默认的容忍度,以容忍 notready:NoExecute 和 unreachable:NoExecute 污点 (如果 Pod 尚未容忍 node.kubernetes.io/not-ready:NoExecute 和node.kubernetes.io/unreachable:NoExecute 污点的话)。 default-not-ready-toleration-seconds 和 default-unreachable-toleration-seconds 的默认值是 5 分钟。
根据实际场景,可以修改此值
三、测试
1、没修改参数时
发布一个 pod,此时 pod 在 k8s-node03 节点
$ kubectl create deployment myapp --image=wangyanglinux/myapp:v1
$ kubectl scale deployment myapp --replicas=20
[root@k8s-master01 ~]# kubectl get pod -o wide | grep node02
myapp-5c9785b6cd-m5c8h 1/1 Running 0 13m 100.16.58.197 k8s-node02 <none> <none>
myapp-5c9785b6cd-qnpx4 1/1 Running 0 13m 100.16.58.198 k8s-node02 <none> <none>
停止 k8s-node02 的 kubelet,使 node notready
[root@k8s-node02 ~]# systemctl stop kubelet
[root@k8s-node02 ~]# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: inactive (dead) since 四 2023-09-21 16:26:06 CST; 10s ago
Docs: https://github.com/kubernetes/kubernetes
Process: 5181 ExecStart=/usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet-conf.yml --container-runtime-endpoint=unix:///run/cri-dockerd.sock --node-labels=node.kubernetes.io/node= (code=exited, status=0/SUCCESS)
Main PID: 5181 (code=exited, status=0/SUCCESS)
节点 notready
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready <none> 83d v1.27.1
k8s-master02 Ready <none> 83d v1.27.1
k8s-master03 Ready <none> 83d v1.27.1
k8s-node01 Ready <none> 83d v1.27.1
k8s-node02 NotReady <none> 83d v1.27.1
稍等片刻,等 pod 重新调度,新 pod 被调度在其它就绪节点上
[root@k8s-master01 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myapp-5c9785b6cd-5q4lb 1/1 Running 0 38m 100.16.122.136 k8s-master02 <none> <none>
myapp-5c9785b6cd-8dbwx 1/1 Running 0 38m 100.16.85.200 k8s-node01 <none> <none>
myapp-5c9785b6cd-995zz 1/1 Running 0 32m 100.16.195.7 k8s-master03 <none> <none>
myapp-5c9785b6cd-9ks7p 1/1 Running 0 38m 100.16.195.5 k8s-master03 <none> <none>
myapp-5c9785b6cd-bdd5c 1/1 Running 0 38m 100.16.122.135 k8s-master02 <none> <none>
myapp-5c9785b6cd-bw4m7 1/1 Running 0 38m 100.16.85.199 k8s-node01 <none> <none>
myapp-5c9785b6cd-dmqv8 1/1 Running 0 38m 100.16.32.133 k8s-master01 <none> <none>
myapp-5c9785b6cd-g454l 1/1 Running 0 38m 100.16.32.134 k8s-master01 <none> <none>
myapp-5c9785b6cd-gzwgs 1/1 Running 0 38m 100.16.195.6 k8s-master03 <none> <none>
myapp-5c9785b6cd-m5c8h 1/1 Terminating 0 38m 100.16.58.197 k8s-node02 <none> <none>
myapp-5c9785b6cd-qnpx4 1/1 Terminating 0 38m 100.16.58.198 k8s-node02 <none> <none>
myapp-5c9785b6cd-ww9j8 1/1 Running 0 32m 100.16.32.135 k8s-master01 <none> <none>
重点:如何判断节点 notready 5 分钟后 pod 被调度
查看 node 的污点创建时间
$ kubectl get node k8s-node02 -o custom-columns=Name:.metadata.name,Taints:.spec.taints
Name Taints
k8s-node02 [map[effect:NoSchedule key:node.kubernetes.io/unreachable timeAdded:2023-09-21T08:29:08Z] map[effect:NoExecute key:node.kubernetes.io/unreachable timeAdded:2023-09-21T08:29:13Z]]
“Z” 是协调世界时中 0 时区的标志 , 2023-09-21T08:29:13Z,即 0 时区 08:29:13
查看重建 pod 的创建时间
$ kubectl describe pod myapp-5c9785b6cd-995zz
State: Running
Started: Thu, 21 Sep 2023 16:34:15 +0800
pod 创建时间 - 污点添加时间 = 5分钟
2、把默认参数修改为 30
编辑 apiserver 配置文件
[root@k8s-master daemonset]# vim /etc/kubernetes/manifests/kube-apiserver.yaml
# 在 command 末尾添加
- --default-not-ready-toleration-seconds=30
- --default-unreachable-toleration-seconds=30
保存退出后,apiserver 会自动重启,如果你的 k8s 集群是高可用集群,有多个 master,那么所有 master 上的 apiserver 都要加这个参数
测试修改参数后的调度时间
启动 k8s-node02 上的 kubelet
[root@k8s-node02 ~]# systemctl start kubelet
停止 k8s-node01 的 kubelet,使节点 notready
[root@k8s-node01 ~]# systemctl stop kubelet
复测后发现时间
Pod创建时间-污点创建时间=22分24秒-21分54秒=30秒
3、为什么有 pod 是 Terminating 状态?
k8s-node02 上的 kubelet 停止,已经是 notready 状态,所以新 pod 创建后,老 pod 是删除不了的,一直处于 Terminating 状态,这时可以手动强制删除,或等节点正常后会自动删除
强制
[root@k8s-master ~]# kubectl delete pod nginx-854d5f75db-v5p8h -n test --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "nginx-854d5f75db-v5p8h" force deleted
[root@k8s-master ~]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-854d5f75db-4vd77 1/1 Running 0 17m 10.244.36.86 k8s-node1 <none> <none>
评论区