K8s Taint污点与Toleration容忍详解
作者:大新屋
提示:kubernetes官网Taint与Toleration污点和容忍文档说明 https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
Taint与Toleration污点和容忍是一种设计理念,Taint在K8s集群上任意节点打上污点,让不能容忍这个污点的Pod不能部署在打了污点的K8s集群节点上。
Toleration是让Pod容忍节点能部署到具有污点的K8s集群节点上,可以让一些需要特殊配置的Pod能够部署到具有污点的K8s集群节点。
一、Taint参数说明(节点配置)
创建一个污点(一个节点可以创建多个污点)
语法: kubectl taint nodes NODE_NAME TAINT_KEY=TAINT_VALUE:EFFECT (TAINT_KEY自定义名称,TAINT_VALUE自定义值,EFFECT是指定的三种污点类型NoSchedule、NoExecute、PreferNoSchedule)
kubectl taint k8s-master01 ssd=true:PreferNoSchedule kubectl taint k8s-master01 k8s-master02 k8s-master03 ssd=true:PreferNoSchedule ### Taint参数说明 NoSchedule # 禁止调度到该节点,已经在该节点上的Pod不受影响 NoExecute # 禁止调度到该节点,已经部署在该节点上的Pod,如果不符合这个污点,Pod立刻会被该节点驱逐出去或过一段时间以后再被驱逐出去 PreferNoSchedule # 尽量避免将Pod调度到指定的节点上,如果没有更合适的节点,可以部署到该节点
二、Toleration参数说明(Pod配置)
1、方式一:完全匹配
tolerations: - key: "TAINT_KEY" operator: "Equal" value: "TAINT_VALUE" effect: "NoSchedule"
2、方式二:不完全匹配
tolerations: - key: "TAINT_KEY" operator: "Exists" effect: "NoSchedule" # 或者 tolerations: - key: "TAINT_KEY" operator: "Equal" value: "TAINT_VALUE"
3、方式三:大范围匹配(不推荐key配置成K8s集群的内置Taint污点)
tolerations: - key: "TAINT_KEY" operator: "Exists" # 或者 tolerations: - effect: "NoSchedule" operator: "Exists"
4、方式四:匹配所有(不推荐)
tolerations: - operator: "Exists"
5、方式五:Toleration配置了NoExectue类型并且设置了tolerationSeconds参数代表Pod在规定时间内自动退出具有该污点的节点
- 应用场景一:k8s集群如果某节点出现故障,Pod默认是300秒后退出该节点,可以利用tolerationSeconds参数让Pod快速退出故障节点
- 应用场景二:k8s集群如果出现网络抖动比较严重,可以延长容忍时长,默认300秒
tolerations: - key: "TAINT_KEY" operator: "Equal" value: "TAINT_VALUE" effect: "NoExectue" tolerationSeconds: 3600
三、Taint 常用命令
### 创建Taint污点,ssd=true自定义 kubectl taint nodes k8s-node01 ssd=true:NoExecute ### 查看节点Taint污点 kubectl describe node k8s-node01 | grep -A 3 Taints kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints ### 基于key删除节点污点 kubectl taint nodes k8s-node01 ssd- ### 基于key + Effect删除节点污点 kubectl taint nodes k8s-node01 ssd:NoExecute- ### 基于key + value + Effect删除节点污点 kubectl taint nodes k8s-node01 ssd=true:NoExecute- ### 修改节点污点(只能修改value值) kubectl taint nodes k8s-node01 ssd=false:NoExecute --overwrite
四、Taint与Toleration案例
1、节点设置Taint污点NoSchedule类型(已部署在该节点上的Pod不会被驱逐)
### 查看k8s-node01节点已运行的Pod kubectl get pods -A -owide | grep k8s-node01 ### 在k8s-node01节点打上一个类型NoSchedule的污点 kubectl taint nodes k8s-node01 system=node:NoSchedule ### 查看k8s-node01和k8s-node02节点污点 kubectl describe node k8s-node01 k8s-node02 | grep -A 3 Taints kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints ### 过300秒后再查看k8s-node01节点已运行的Pod是否会被驱逐,NoSchedule参数默认不驱逐已部署在该节点的Pod kubectl get pods -A -owide | grep k8s-node01 ### 创建Deployment(设置toleration容忍完全匹配taint污点健值对) mkdir -p /data/yaml/taint cat > /data/yaml/taint/nginx-deploy-noschedule.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-deploy name: nginx-deploy namespace: default spec: replicas: 6 selector: matchLabels: app: nginx-pod template: metadata: labels: app: nginx-pod spec: containers: - image: registry.cn-shenzhen.aliyuncs.com/dockerghost/nginx:1.26 name: nginx tolerations: - key: "system" operator: "Equal" value: "node" effect: "NoSchedule" EOF kubectl create -f /data/yaml/taint/nginx-deploy-noschedule.yaml ### 查看此时Pod节点可以部署到k8s-node01节点上 kubectl get pods -n default -owide ### 删除污点 kubectl taint nodes k8s-node01 system=node:NoSchedule- kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
2、节点设置Taint污点NoExecute类型(此时会驱逐没有容忍该污点的Pod)
提示:如果calico-node没有被驱逐出去,说明calico-node的Pod设置了tolerations容忍了类型NoExecute的污点,可以通过kubectl get pods calico-node-rzj4b -n kube-system -oyaml | egrep "effect|operator"查看calico-node容器是否设置了tolerations参数容忍
### 查看k8s-node02节点已运行的Pod kubectl get pods -A -owide | grep k8s-node02 ### 在 k8s-node02节点打上一个类型的NoExecute污点 kubectl taint nodes k8s-node02 disk=ssd:NoExecute ### 查看k8s-node02节点污点 kubectl describe node k8s-node02 | grep -A 3 Taints kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints ### 过300秒后再查看 k8s-node02节点已运行的Pod是否被驱逐出去 kubectl get pods -A -owide | grep k8s-node02 ### 创建Deployment(设置toleration容忍匹配所有taint污点NoExecute类型) mkdir -p /data/yaml/taint cat > /data/yaml/taint/nginx-deploy-noexecute.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-deploy name: nginx-deploy namespace: default spec: replicas: 6 selector: matchLabels: app: nginx-pod template: metadata: labels: app: nginx-pod spec: containers: - image: registry.cn-shenzhen.aliyuncs.com/dockerghost/nginx:1.26 name: nginx tolerations: - effect: "NoExecute" operator: "Exists" EOF kubectl create -f /data/yaml/taint/nginx-deploy-noexecute.yaml ### 查看此时Pod节点可以部署到k8s-node02节点上 kubectl get pods -n default -owide ### 删除节点污点 kubectl taint nodes k8s-node02 disk=ssd:NoExecute- kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
3、节点设置Taint污点NoSchedule类型并打标,Pod同时设置nodeSelector参数与tolerations参数
注意:如果Pod同时配置nodeSelector和tolerations参数,nodeSelector参数优先级大于tolerations参数;如果tolerations容忍完全不匹配taint污点健值对,但又设置了nodeSelector参数,此时Pod会因参数冲突导致Pod无法部署在nodeSelector参数指定的节点上
### 在k8s-node01和k8s-node02节点打上一个类型NoSchedule污点 kubectl taint nodes k8s-node01 k8s-node02 ssd=true:NoSchedule ### 查看k8s-node01和k8s-node02节点污点 kubectl describe node k8s-node01 k8s-node02 | grep -A 3 Taints kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints ### 在k8s-node01、k8s-node02、k8s-node03节点打上一个标签(disktype=ssd自定义) kubectl label node k8s-node01 k8s-node02 k8s-node03 disktype=ssd ### 查看k8s集群节点标签 kubectl get nodes --show-labels | grep disktype=ssd kubectl get nodes --show-labels -l disktype=ssd ### 创建Deployment(Pod配置nodeSelector参数和tolerations参数,使tolerations容忍完全匹配taint污点健值对) mkdir -p /data/yaml/taint cat > /data/yaml/taint/nginx-deploy-podnoschedule.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-deploy name: nginx-deploy spec: replicas: 6 selector: matchLabels: app: nginx-pod template: metadata: labels: app: nginx-pod spec: containers: - image: registry.cn-shenzhen.aliyuncs.com/dockerghost/nginx:1.26 name: nginx nodeSelector: disktype: "ssd" tolerations: - key: "ssd" operator: "Equal" value: "true" effect: "NoSchedule" EOF kubectl create -f /data/yaml/taint/nginx-deploy-podnoschedule.yaml ### 查看此时Pod节点可以部署到k8s-node01 k8s-node02、k8s-node03节点上 kubectl get pods -n default -owide ### 删除节点污点 kubectl taint nodes k8s-node01 k8s-node02 ssd=true:NoSchedule- kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints ### 删除节点标签 kubectl label nodes k8s-node01 k8s-node02 k8s-node03 disktype- kubectl get nodes --show-labels -l disktype=ssd
4、节点设置NoExecute污点并打标,Pod同时设置nodeSelector参数与tolerations参数
注意:如果Pod同时配置nodeSelector和tolerations参数,nodeSelector参数优先级大于tolerations参数;如果tolerations容忍完全不匹配taint污点健值对,但又设置了nodeSelector参数,此时Pod会因参数冲突导致Pod无法部署在nodeSelector参数指定的节点上
### 在k8s-node01和k8s-node02节点打上一个类型NoExecute污点 kubectl taint nodes k8s-node01 k8s-node02 ssd=true:NoExecute ### 查看k8s-node01和k8s-node02节点污点 kubectl describe node k8s-node01 k8s-node02 | grep -A 3 Taints kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints ### 在k8s-node01、k8s-node02、k8s-node03节点打上一个标签(disktype=ssd自定义) kubectl label node k8s-node01 k8s-node02 k8s-node03 disktype=ssd ### 查看k8s集群节点标签 kubectl get nodes --show-labels | grep disktype=ssd kubectl get nodes --show-labels -l disktype=ssd ### 创建Deployment的yaml文件,配置nodeSelector参数和tolerations参数 cat > /data/yaml/nginx-deploy.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-deploy name: nginx-deploy spec: replicas: 6 selector: matchLabels: app: nginx-deploy template: metadata: labels: app: nginx-deploy spec: containers: - image: registry.cn-shenzhen.aliyuncs.com/dockerghost/nginx:1.26 name: nginx nodeSelector: disktype: "ssd" tolerations: - key: "ssd" operator: "Equal" value: "true" effect: "NoExecute" EOF ### 创建Deployment(Pod配置nodeSelector参数和tolerations参数,使tolerations容忍完全匹配taint污点健值对) mkdir -p /data/yaml/taint cat > /data/yaml/taint/nginx-deploy-podnoexecute.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-deploy name: nginx-deploy spec: replicas: 6 selector: matchLabels: app: nginx-pod template: metadata: labels: app: nginx-pod spec: containers: - image: registry.cn-shenzhen.aliyuncs.com/dockerghost/nginx:1.26 name: nginx nodeSelector: disktype: "ssd" tolerations: - key: "ssd" operator: "Equal" value: "true" effect: "NoExecute" EOF kubectl create -f /data/yaml/taint/nginx-deploy-podnoexecute.yaml ### 查看此时Pod节点可以部署到k8s-node01 k8s-node02、k8s-node03节点上 kubectl get pods -n default -owide ### 删除节点污点 kubectl taint nodes k8s-node01 k8s-node02 ssd=true:NoSchedule- kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints ### 删除节点标签 kubectl label nodes k8s-node01 k8s-node02 k8s-node03 disktype- kubectl get nodes --show-labels -l disktype=ssd
五、K8s集群内置Taint和Pod默认Ttoleration策略
1、k8s集群内置Taint污点
node.kubernetes.io/not-ready # 节点未准备好,相当于节点状态Ready的值为False node.kubernetes.io/unreachable # Node Controller访问不到节点,相当于节点状态Ready值变为Unknown值 node.kubernetes.io/out-of-disk # 节点磁盘耗尽 node.kubernetes.io/memory-pressure # 节点存在内存压力 node.kubernetes.io/disk-pressure # 节点存在薇盘压力 node.kubernetes.io/network-unavailable # 节点网终不可达 node.kubernetes.io/unschedulable # 节点不可调度 node.cloudprovider.kubernetes.io/uninitialized # 如果Kubelet启动时指定了一个外部的cloudprovider,它将给当前节点添加一个Taint将其标记为不可用。在coud-controller-manager的一个controller初始化这个节点后,Kubelet将删除这个Taint
2、创建Pod默认的Ttoleration容忍策略
### 创建Deployment kubectl create deploy nginx-deploy --image=registry.cn-shenzhen.aliyuncs.com/dockerghost/nginx:1.26 -n default ### 查看Deployment和Pod kubectl get deploy -n default kubectl get pods -n default ### 查看Pod默认的Toleration容忍策略 kubectl get pods nginx-deploy-6988f8548f-swkfv -n default -oyaml .............................. tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready # 节点未准备好,相当于节点状态Ready的值为False operator: Exists tolerationSeconds: 300 # 默认5分钟后强制驱逐已存在的Pod - effect: NoExecute key: node.kubernetes.io/unreachable # Node Controller访问不到节点,相当于节点状态Ready值变为Unknown值 operator: Exists tolerationSeconds: 300 # 默认5分钟后强制驱逐已存在的Pod ..............................
六、模拟K8s集群节点宕机快速迁移Pod
### 节点宕机由状态Ready值变为Unknown(False)值中间是由kube-controller-manager服务--node-monitor-grace-period参数设置检查时长 [root@k8s-master01 ~]# cat /usr/lib/systemd/system/kube-controller-manager.service | grep "node-monitor-grace-period" --node-monitor-grace-period=40s \ ### 在k8s-node01和k8s-node02节点打上一个类型NoExecute污点和一个标签 kubectl taint nodes k8s-node01 k8s-node02 ssd=true:NoExecute kubectl label nodes k8s-node01 k8s-node02 disktype=ssd ### 查看k8s-node01和k8s-node02节点污点和标签 kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints kubectl get nodes --show-labels -l disktype=ssd ### 创建Deployment(Nginx容器配置当k8s-node02节点宕机toleration容忍10秒后自动迁移到别的节点) mkdir -p /data/yaml/taint cat > /data/yaml/taint/nginx-deploy.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-deploy name: nginx-deploy spec: replicas: 1 selector: matchLabels: app: nginx-pod template: metadata: labels: app: nginx-pod spec: containers: - image: registry.cn-shenzhen.aliyuncs.com/dockerghost/nginx:1.26 name: nginx nodeSelector: disktype: "ssd" tolerations: - key: "ssd" operator: "Equal" value: "true" effect: "NoExecute" - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 10 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 10 EOF kubectl create -f /data/yaml/taint/nginx-deploy.yaml kubectl get pods -owide -n default ### 创建Deployment(redis容器配置toleration容忍,但使用内置的容忍时间,默认300秒) cat > /data/yaml/taint/redis-deploy.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: labels: app: redis-deploy name: redis-deploy spec: replicas: 1 selector: matchLabels: app: redis-pod template: metadata: labels: app: redis-pod spec: containers: - image: registry.cn-shenzhen.aliyuncs.com/dockerghost/redis:latest name: redis nodeSelector: disktype: "ssd" tolerations: - key: "ssd" operator: "Equal" value: "true" effect: "NoExecute" EOF kubectl create -f /data/yaml/taint/redis-deploy.yaml kubectl get pods -owide -n default ### 此时nginx容器和redis容器正好都部署在k8s-node02节点上 [root@k8s-master01 ~]# kubectl get pods -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deploy-67745bdcf8-jfsst 1/1 Running 0 2m31s 172.30.58.244 k8s-node02 <none> <none> redis-deploy-6d549cd6bd-xdb5x 1/1 Running 0 50s 172.30.58.245 k8s-node02 <none> <none> ### 登录k8s-node02节点关机模拟异常宕机 init 0 ### 登录master管理节点查看节点状态,此时k8s-node02节点状态变成了NotReady [root@k8s-master01 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master01 Ready <none> 14d v1.28.15 k8s-master02 Ready <none> 14d v1.28.15 k8s-master03 Ready <none> 14d v1.28.15 k8s-node01 Ready <none> 14d v1.28.15 k8s-node02 NotReady <none> 14d v1.28.15 ### 登录master管理节点查看Pod状态,因为redis容器比nginx容器容忍时长,所以nginx容器迁移到k8s-node02节点,而redis容器还在等待容忍时长 [root@k8s-master01 ~]# kubectl get pods -owide -n default NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deploy-67745bdcf8-jfsst 1/1 Terminating 0 4m20s 172.30.58.244 k8s-node02 <none> <none> nginx-deploy-67745bdcf8-np7hl 1/1 Running 0 19s 172.30.85.203 k8s-node01 <none> <none> redis-deploy-6d549cd6bd-xdb5x 1/1 Running 0 2m39s 172.30.58.245 k8s-node02 <none> <none> ### 删除节点污点 kubectl taint nodes k8s-node01 k8s-node02 ssd=true:NoExecute- kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints ### 删除节点标签 kubectl label nodes k8s-node01 k8s-node02 disktype- kubectl get nodes --show-labels -l disktype=ssd
总结
以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。