云其它

关注公众号 jb51net

关闭
首页 > 网站技巧 > 服务器 > 云和虚拟化 > 云其它 > Kubernetes之Pod调度方式

Kubernetes之Pod的调度实现方式

作者:Hud.

Kubernetes通过定向调度(NodeName/NodeSelector)、亲和性调度(NodeAffinity/PodAffinity/PodAntiAffinity)及污点容忍(Taints/Toleration)实现Pod节点控制,分别用于强制指定节点、优化部署位置和灵活管理节点准入,满足不同场景下的调度需求

在默认情况下,一个pod在哪个Node节点上运行,是由Scheduler组件采用相应的算法计算出来的,这个过程是不受人工控制的。

但是在实际的使用过程当中,这并不能满足需求,因为在很多情况下,我们想控制某些Pod到某些节点上,那么应该怎么做呢?

这就要求了解Kubernetes对于Pod的调度规则,Kubernetes提供了四大类的调度方式 

1、定向调度

定向调度,指的是利用在pod上声明NodeName或者nodeSelector,以此将Pod调度到期望的node节点上,注意,这里的调度是强制的,这就意味着即使要调度的目标Node不存在,也会向上面进行调度,只不过pod运行失败而已。

1.1 NodeName

NodeName用于强制约束将Pod调度到指定的Name的Node节点上。这种方式,其实就是直接跳过Scheduler的调度逻辑,直接写入PodList列表。

接下来,就这个定向调度创建一个pod-nodename,yaml

apiVersion: v1
kind: Pod
metadata:
  name: pod-nodename
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  nodeName: k8s-node1   # 指定调度到node1节点上去

# 创建Pod
[root@master ~]# vi pod-nodename.yaml
[root@master ~]# kubectl create -f pod-nodename.yaml
pod/pod-nodename created
#修改上面的文件再创建一个pod改到node2节点上去,看看效果
[root@master ~]# kubectl get pod -n dev -o wide
NAME            READY   STATUS    RESTARTS   AGE     IP               NODE        NOMINATED NODE   READINESS GATES
pod-nodename    1/1     Running   0          2m25s   10.244.36.91     k8s-node1   <none>           <none>
pod-nodename2   1/1     Running   0          5s      10.244.169.163   k8s-node2   <none> 

对了,这里插一句,关于Node节点的名称,你要查看可以使用命令:

[root@master ~]# kubectl get node
NAME        STATUS   ROLES                  AGE   VERSION
k8s-node1   Ready    <none>                 12d   v1.23.5
k8s-node2   Ready    <none>                 12d   v1.23.5
master      Ready    control-plane,master   12d   v1.23.5

如果我在yaml文件中将nodeName改成了k8s-node3,会不会创建成功呢?来试试看:

[root@master ~]# kubectl create -f pod-nodename.yaml
pod/pod-nodename2 created
[root@master ~]# kubectl get pod -n dev
NAME            READY   STATUS    RESTARTS   AGE
pod-nodename2   0/1     Pending   0          14s

1.2 NodeSelector

NodeSelector用于将pod调度到添加了指定标签的node节点上,他是通过Kubernetes的label-selector机制实现的,也就是说在pod创建之前会由scheduler使用MatchNodeSelector调整策略进行label匹配,找出目标node,然后将pod调度到目标节点,该匹配规则是强制约束。

通过一个小案例,熟悉一下操作:

① 首先分别为node节点添加标签

[root@master ~]# kubectl label nodes k8s-node1 nodeenv=first
node/k8s-node1 labeled
[root@master ~]# kubectl label nodes k8s-node2 nodeenv=second
node/k8s-node2 labeled

② 创建一个pod-nodeselector.yaml文件,使用它来创建pod

apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeselector
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  nodeSelector:
    nodeenv: first   # 将pod调度到nodeenv=first的node上去

# 创建容器并查看是否被放在了指定的node节点上
[root@master ~]# vi pod-nodeselector.yaml
[root@master ~]# kubectl create -f pod-nodeselector.yaml
pod/pod-nodeselector created
[root@master ~]# kubectl get pod -n dev -o wide
NAME               READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
pod-nodeselector   1/1     Running   0          19s   10.244.36.92   k8s-node1   <none>           <none>

2、亲和性调度

定向调度虽然使用起来非常方便但是也有一定的问题,那就是如果没有满足条件的node,那么pod就不会运行,即使在集群中还有可用的node列表也不行,这也就限制了这种调度方式的可用场景。

基于这个问题,Kubernetes还提供了一种亲和性调度(Affinity),他在nodeSelector的基础上进行了扩展,可以通过配置的形式,实现优先选择满足条件的node进行调度,如果没有,也可以调度到不满足条件的节点上去,使得调度更加灵活:

Affinity主要分为三类:

关于亲和性(反亲和性)使用场景的说明:

2.1 NodeAffinity的可选配置项

[root@master ~]# kubectl explain pod.spec.affinity.nodeAffinity
KIND:     Pod
VERSION:  v1
   # 优先调度到满足指定规则的node,相当于软限制(倾向)
   preferredDuringSchedulingIgnoredDuringExecution	<[]Object> 
    preference    #一个节点选择器项,与相应的权重相关联
    matchFields    #按节点字段列出的节点选择器要求列表
    matchExpression    #按节点标签列出的节点选择器要求列表(推荐)
    key     #键
    value    #值
    operator    #关系符
    weight        #倾向权重,0~100之间
   #Node节点必须满足指定的所有规则才可以,相当于硬限制
   requiredDuringSchedulingIgnoredDuringExecution	<Object>
    nodeSelectorTerms    #节点选择列表
    matchFields    #按节点字段列出的节点选择器要求列表
    matchExpression    #按节点标签列出的节点选择器要求列表(推荐)
    key     #键
    value    #值
    operator    #关系符

关系符的使用说明:

-matchExpression:
    -key: nodeenv        #匹配存在标签的key为nodeenv的节点
     operator: Exists
    -key: nodeenv        #匹配标签的key为nodeenv,且value是"xxx"或"yyy"的节点
     operator: In     
     values:["xxx","yyy"]
    -key:nodeenv         #匹配标签的key为nodeenv且value大于"xxx"的节点
     operator: Gt
     values: "xxx"

先来演示一下 requiredDuringSchedulingIgnoredDuringExecution,创建pod-nodeaffinity-required.yaml

apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeaffinity-required
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  affinity:        # 亲和性设置    
    nodeAffinity:     # 设置node亲和性
      requiredDuringSchedulingIgnoredDuringExecution:   # 硬限制
        nodeSelectorTerms:
        - matchExpressions:   # 匹配env的值在["xxx","yyy"]中的标签
          - key: nodeenv
            operator: In
            values: ["xxx","yyy"]

# 创建容器
[root@master ~]# kubectl create -f pod-nodeaffinity-required.yaml 
pod/pod-affinity-required created
# 先看看node的标签
[root@master ~]# kubectl get node --show-labels
NAME        STATUS   ROLES                  AGE   VERSION   LABELS
k8s-node1   Ready    <none>                 12d   v1.23.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux,nodeenv=first
k8s-node2   Ready    <none>                 12d   v1.23.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node2,kubernetes.io/os=linux,nodeenv=second
master      Ready    control-plane,master   12d   v1.23.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=

现在的节点中,node1和node2的nodeenv标签不存在xxx或者yyy,那么也就意味着这个pod应该是无法运行成功的

[root@master ~]# kubectl get pod -n dev
NAME                    READY   STATUS    RESTARTS   AGE
pod-affinity-required   0/1     Pending   0          5m4s

# 使用describe看看调度失败的原因
[root@master ~]# kubectl describe pod pod-nodeaffinity-required -n dev
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  22s (x10 over 10m)  default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector.

现在显示是失败,试试xxx改成first重新创建看看

[root@master ~]# vi pod-nodeaffinity-required.yaml 
[root@master ~]# kubectl create -f pod-nodeaffinity-required.yaml 
pod/pod-nodeaffinity-required created
[root@master ~]# kubectl get pod -n dev -o wide
NAME                        READY   STATUS    RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
pod-nodeaffinity-required   1/1     Running   0          12s     10.244.36.93   k8s-node1   <none>           <none>

下面再来看看软限制(requiredDuringSchedulingIgnoredDuringExecution)

创建pod-nodeaffinity-preferred.yaml

apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeaffinity-preferred
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  affinity:        # 亲和性设置
    nodeAffinity:     # 设置node亲和性
      preferredDuringSchedulingIgnoredDuringExecution:   # 软限制
        - weight: 1
          preference:      # 匹配env的值在["xxx","yyy"]中的标签(当前环境中没有)
            matchExpressions:
            - key: nodeenv
              operator: In
              values: ["xxx","yyy"]
~ 

# 创建pod并查看
[root@master ~]# kubectl create -f pod-nodeaffinity-preferred.yaml 
pod/pod-nodeaffinity-preferred created
[root@master ~]# kubectl get pod -n dev -o wide
NAME                         READY   STATUS    RESTARTS   AGE    IP               NODE        NOMINATED NODE   READINESS GATES
pod-nodeaffinity-preferred   1/1     Running   0          119s   10.244.169.164   k8s-node2   <none>           <none>                                           

调度亲和性只在调度过程中生效,调度完成后即使node的标签发生了改变,也不理你了。

2.2 PodAffinity的可选配置项

[root@master ~]# kubectl explain pod.spec.affinity.podAffinity
FIELDS:
  # 软限制 
   requiredDuringSchedulingIgnoredDuringExecution	<[]Object>
    namespace        # 指定参照pod的namespace
    topologyKey      # 指定调度作用域
    labelSelector    # 标签选择器
      matchExpressions:
        key    # 键
        values    # 值
        operator  # 关系符
        matchlabels      # 指多个matchExpressions映射的内容
  # 软限制
     preferredDuringSchedulingIgnoredDuringExecution	<[]Object>
        namespace        # 指定参照pod的namespace
        topologyKey      # 指定调度作用域
        labelSelector    # 标签选择器
          matchExpressions:
            key    # 键
            values    # 值
            operator  # 关系符
            matchlabels      # 指多个matchExpressions映射的内容
          weight     # 权重

topologyKey用于指定调度时作用域,例如:

1)先创建一个参照的pod,pod-podaffinity-target.yaml

apiVersion: v1
kind: Pod
metadata:
  name: pod-podaffinity-target
  namespace: dev
  labels:        # 设置标签
    podenv:  target
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  nodeName: k8s-node1   # 部署到node1上面

#创建并查看pod
[root@master ~]# vi pod-podaffinity-target.yaml
[root@master ~]# kubectl create -f pod-podaffinity-target.yaml 
pod/pod-podaffinity-target created
[root@master ~]# kubectl get pod -n dev
NAME                     READY   STATUS    RESTARTS   AGE
pod-podaffinity-target   1/1     Running   0          9s

 2)创建pod-podaffinity-required.yaml

apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeaffinity-preferred
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  affinity:        # 亲和性设置
    podAffinity:     # 设置pod亲和性
      requiredDuringSchedulingIgnoredDuringExecution:   # 硬限制
        - labelSelector:
            matchExpressions:      # 匹配env的值在["xxx","yyy"]中的标签
            - key: nodeenv
              operator: In
              values: ["xxx","yyy"]
           topologyKey: kubernetes.io/hostname  # 如果找不到,调度到目标pod同一个节点上

# 创建并查看
[root@master ~]# kubectl create -f pod-podaffinity-required.yaml 
pod/pod-podaffinity-required created
[root@master ~]# kubectl get pod -n dev -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
pod-podaffinity-required   0/1     Pending   0          8s    <none>         <none>      <none>           <none>
pod-podaffinity-target     1/1     Running   0          14m   10.244.36.94   k8s-node1   <none>           <none>
[root@master ~]# kubectl describe pod pod-podaffinity-required -n dev
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  13s (x2 over 101s)  default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity rules.

有一个节点上有污点(taint)---主节点

2.3 podAntiAffinity 反亲和性案例

还是将刚才设置的target作为参照,反亲和这个pod那应该被调度到node2上

apiVersion: v1
kind: Pod
metadata:
  name: pod-podantiaffinity-required
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  affinity:        # 亲和性设置
    podAntiAffinity:     # 设置pod反亲和性
      requiredDuringSchedulingIgnoredDuringExecution:   # 硬限制
        - labelSelector:
            matchExpressions:      # 匹配podenv的值在["target"]中的标签
            - key: podenv
              operator: In
              values: ["target"]
          topologyKey: kubernetes.io/hostname  # 如果找不到,调度到目标pod同一个节点上

# 创建并查看pod
[root@master ~]# kubectl create -f pod-podantiaffinity-required.yaml 
pod/pod-podantiaffinity-required created
[root@master ~]# kubectl get pod -n dev -o wide  
NAME                           READY   STATUS    RESTARTS   AGE   IP               NODE        NOMINATED NODE   READINESS GATES
pod-podaffinity-required       1/1     Running   0          10m   10.244.36.95     k8s-node1   <none>           <none>
pod-podaffinity-target         1/1     Running   0          28m   10.244.36.94     k8s-node1   <none>           <none>
pod-podantiaffinity-required   1/1     Running   0          43s   10.244.169.165   k8s-node2   <none>           <none>

3、 污点调度--站在Node的角度

3.1 污点(Taints)

前面的调度方式都是站在pod的角度上,通过在Pod上添加属性,来确定Pod是否需要调度到指定的Node上,其实我们也可以站在Node的角度上面,通过在Node上添加污点属性,来决定是否允许Pod调度过来。

Node被设置上污点之后就和Pod之间存在了一种相斥的关系,进而拒绝Pod调度进来,设置可以将已经调度进来的Pod驱逐出去。

污点的格式为:key=value:effect,key和value是污点的标签,effect描述污点的作用,支持如下的三个选项:

# 设置污点
kubectl taint nodes k8s-node1 key=value:effect

# 去除污点
kubectl taint nodes k8s-node1 key:effect-

# 去除所有污点
kubectl taint nodes k8s-node1 key-

接下来,演示一下污点的效果:

[root@master ~]# kubectl get node
NAME        STATUS     ROLES                  AGE   VERSION
k8s-node1   Ready      <none>                 13d   v1.23.5
k8s-node2   NotReady   <none>                 13d   v1.23.5
master      Ready      control-plane,master   13d   v1.23.5
# 为node1设置污点
[root@master ~]# kubectl taint nodes k8s-node1 tag=qty:PreferNoSchedule
node/k8s-node1 tainted
# 创建pod1
[root@master ~]# kubectl run taint1 --image=naginx:1.17.1 -n dev
pod/taint1 created
[root@master ~]# kubectl get pod -n dev
NAME                           READY   STATUS        RESTARTS   AGE
taint1                         1/1     Running       0          53s
# 为node1设置污点,改为NoScheduler
[root@master ~]# kubectl taint nodes k8s-node1 tag:PreferNoSchedule-
node/k8s-node1 untainted
[root@master ~]# kubectl taint nodes k8s-node1 tag=qty:NoSchedule
node/k8s-node1 tainted

# 创建pod2
[root@master ~]# kubectl run taint2 --image=nginx -n dev
pod/taint2 created
[root@master ~]# kubectl get pod -n dev
NAME                           READY   STATUS        RESTARTS   AGE
taint1                         1/1     Running       0          7m14s
taint2                         0/1     Pending       0          11s

# 为node1设置污点
[root@master ~]# kubectl taint nodes k8s-node1 tag:NoSchedule-
node/k8s-node1 untainted
[root@master ~]# kubectl taint nodes k8s-node1 tag=qty:NoExecute
node/k8s-node1 tainted

# 创建pod3
[root@master ~]# kubectl run taint3 --image=nginx -n dev
pod/taint3 created
[root@master ~]# kubectl get pod -n dev
NAME                           READY   STATUS        RESTARTS   AGE
taint3                         0/1     Pending       0          4s

# 删除所有污点
[root@master ~]# kubectl taint nodes k8s-node1 tag-
node/k8s-node1 untainted

3.2 容忍(Toleration)

污点就是拒绝,容忍就是忽略,Node通过污点拒绝pod调度上去,pod通过容忍忽略拒绝

创建pod-toleration.yaml

apiVersion: v1
kind: Pod
metadata:
  name: pod-toleration
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  tolerations:        # 添加容忍
  - key: "tag"         # 要容忍的污点的key
    operator: "Equal"   # 操作符
    value: "qty"        # 容忍的污点的value
    effect: "NoExecute"    # 添加容忍的规则,要和污点的规则相同

#创建pod查看
[root@master ~]# kubectl create -f pod-toleration.yaml 
pod/pod-toleration created
[root@master ~]# kubectl get pod -n dev
NAME                           READY   STATUS        RESTARTS   AGE
pod-toleration                 1/1     Running       0          39s

                                                                         

总结

以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。

您可能感兴趣的文章:
阅读全文