概述
在 Kubernetes 中,Pod的调度是通过kube-schedule来实现的,Pod的调度会经过一系列算法来进行完成。
在实际生产过程中,我们想让Pod调度到我们想要的节点上,往往通过kube-schedule默认的调度策略无法实现,这个时候我们需要指定一些策略来帮助我们实现。
将Pod调度到指定的节点上有三种方式:
- nodeName:通过节点名称完成调度
- nodeSelector:通过节点标签完成调度
- nodeAffinity:通过节点亲和性完成调度
nodeName
nodeName 是 Pod.spec 中的一个字段,用于显式指定 Pod 应该运行的节点名称。当 Kubernetes 创建 Pod 时,调度器会跳过常规调度逻辑,直接将 Pod 分配到指定的节点上。
工作原理
- 跳过调度器:使用 nodeName 时,Pod 不会经过 Kubernetes 默认调度器(kube-scheduler)的资源评估和节点选择流程。因此不受污点的影响
- 直接绑定:API Server 会直接将 Pod 绑定到指定节点,节点上的 kubelet 负责创建容器。
- 节点必须存在:指定的节点名称必须是集群中已存在且状态正常的节点,否则 Pod 会处于Pending状态。
实战案例
指定Pod调度到node01节点上- [root@master ~/schedule]# cat node.yaml
- apiVersion: v1
- kind: Pod
- metadata:
- name: nginx-pod
- spec:
- # 直接指定节点名称
- nodeName: node01
- containers:
- - name: nginx
- image: nginx
- [root@master ~/schedule]# kubectl apply -f node.yaml
- pod/nginx-pod created
- # 查看Pod,发现在node01节点上
- [root@master ~/schedule]# kubectl get po -o wide
- NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
- nginx-pod 1/1 Running 0 11s 100.117.144.185 node01 <none> <none>
复制代码 验证在有污点的情况下是否能调度成功
示例:
给node02节点添加一个污点- [root@master ~/schedule]# kubectl taint nodes node02 app=node:NoSchedule
- node/node02 tainted
- # 检查污点是否创建成功
- [root@master ~/schedule]# kubectl describe node node02 | grep Taint
- Taints: app=node:NoSchedule
复制代码 创建Pod指定nodeName为node02- [root@master ~/schedule]# cat node.yaml
- apiVersion: v1
- kind: Pod
- metadata:
- name: nginx-pod
- spec:
- # 直接指定节点名称
- nodeName: node02
- containers:
- - name: nginx
- image: nginx
- [root@master ~/schedule]# kubectl apply -f node.yaml
- pod/nginx-pod created
- # 查看Pod已经调度到node02节点上
- [root@master ~/schedule]# kubectl get po -o wide
- NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
- nginx-pod 1/1 Running 0 10s 100.95.185.251 node02 <none> <none>
复制代码总结:nodeName 是一种静态、强制的调度方式,适用于对特定节点有明确依赖的场景,但缺乏弹性和自动恢复能力。在生产环境中,建议优先使用 NodeAffinity、Taints/Tolerations 等更灵活的调度机制,仅在必要时(如紧急修复)使用 nodeName。
nodeSelector
nodeSelector是 Pod.spec 中的一个字段,用于定义节点选择规则。通过在 Pod 定义中设置nodeSelector,可以确保 Pod 只被调度到具有匹配标签的节点上。
nodeSelector会受到污点的影响
工作原理
- 标签(Labels):节点可以被打上标签(例如:disktype=ssd、env=prod)。
- 选择器(Selector):Pod 通过nodeSelector指定需要匹配的标签。
- 调度器(Scheduler):Kubernetes 调度器会根据nodeSelector的规则,将 Pod 调度到符合条件的节点上。
实战案例
给node01节点创建标签env=prod- [root@master ~/schedule]# kubectl label node node01 env=prod
复制代码 创建Pod- [root@master ~/schedule]# cat selector.yaml
- apiVersion: v1
- kind: Pod
- metadata:
- name: nginx-pod
- spec:
- # 指定标签选择
- nodeSelector:
- # 直接指定标签
- env: prod
- # 可以指定多个标签
- # diskType: ssd
- containers:
- - name: nginx
- image: nginx
- [root@master ~/schedule]# kubectl apply -f
- node.yaml selector.yaml
- [root@master ~/schedule]# kubectl apply -f selector.yaml
- pod/nginx-pod created
- # 发现Pod已经调度到node01节点上了
- [root@master ~/schedule]# kubectl get po -o wide
- NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
- nginx-pod 1/1 Running 0 6s 100.117.144.186 node01 <none> <none>
复制代码 验证在有污点的情况下是否能调度成功
给node02节点创建一个污点及标签- # 创建污点[root@master ~/schedule]# kubectl taint nodes node02 app=node:NoSchedule
- node/node02 tainted
- # 检查污点是否创建成功
- [root@master ~/schedule]# kubectl describe node node02 | grep Taint
- Taints: app=node:NoSchedule# 创建标签[root@master ~/schedule]# kubectl label node node02 disk-type=ssd# 查看标签是否创建成功[root@master ~/schedule]# kubectl describe node node02 | grep Label -A 3Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux disk-type=ssd kubernetes.io/arch=amd64
复制代码 创建Pod
发现Pod处于Pending状态
- [root@master ~/schedule]# cat selector.yaml
- apiVersion: v1
- kind: Pod
- metadata:
- name: nginx-pod
- spec:
- # 指定标签选择
- nodeSelector:
- # 直接指定标签
- # env: prod
- # 可以指定多个标签
- diskType: ssd
- containers:
- - name: nginx
- image: nginx
- [root@master ~/schedule]# kubectl apply -f selector.yaml
- pod/nginx-pod created
- # Pod处于Pending状态
- [root@master ~/schedule]# kubectl get po -o wide
- NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
- nginx-pod 0/1 Pending 0 6s <none> <none> <none> <none>
复制代码 查看一下详细信息
发现受污点的影响
- [root@master ~/schedule]# kubectl describe po nginx-pod
- Name: nginx-pod
- #...省略部分内容
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- Warning FailedScheduling 75s default-scheduler 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint {app: node}, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
复制代码 如果想要让nginx-pod调度到node02节点上,需要添加污点容忍,污点相关的内容可以阅读这篇文章:K8s中的污点和容忍
节点亲和性
在 Kubernetes 中,节点亲和性(Node Affinity) 是一种更灵活的调度机制,用于控制 Pod 如何被分配到特定节点。与nodeSelector相比,它支持更复杂的匹配规则和软策略,让调度策略更具弹性。
节点亲和性的作用
精准匹配节点标签
节点亲和性通过定义 标签匹配规则(如节点必须包含 / 不包含某个标签、标签值在指定范围内等),将 Pod 调度到符合条件的节点。
- 硬性规则(requiredDuringScheduling):强制 Pod 只能调度到满足条件的节点,否则保持 Pending 状态。
例:将数据库 Pod 强制调度到标记为 role=database 且 storage=ssd 的节点。
- 软性规则(preferredDuringScheduling):优先调度到满足条件的节点,若不满足则尝试其他节点(通过权重配置优先级)。
例:优先将 Web 服务 Pod 调度到 region=east 的节点(权重 100),其次调度到 zone=az1 的节点(权重 70)。
精细化控制
替代简单的 nodeSelector,支持复杂逻辑(如 “节点必须包含 A 标签且不包含 B 标签”)。
灵活容错
软性规则允许调度器在条件不满足时 “退而求其次”,避免 Pod 长时间处于 Pending。
资源利用率优化
通过标签分层(如环境、硬件、地域),实现集群资源的合理分配与负载均衡。
节点亲和性实战-硬性规则匹配
必须满足条件才能调度,否则 Pod 将处于 Pending 状态。调度成功后,即使节点标签变更导致规则不再满足,Pod 也不会被驱逐。
将 Pod 调度到同时满足以下条件的节点:
- 标签 env=prod
- 标签 disk-type=ssd
- 标签 cpu-cores > 2
示例:
给node01节点打上标签- [root@master ~]# kubectl label node node01 env=prod disk-type=ssd cpu-cores=3
- node/node01 labeled
- # 查看标签
- [root@master ~]# kubectl describe node node01 | grep Labels -A 5
- Labels: beta.kubernetes.io/arch=amd64
- beta.kubernetes.io/os=linux
- cpu-cores=3
- disk-type=ssd
- env=prod
- kubernetes.io/arch=amd64
复制代码 创建Pod- [root@master ~/affinity]# cat affinity-deploy.yaml
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: nginx-affinity
- spec:
- replicas: 5
- selector:
- matchLabels:
- app: nginx
- template:
- metadata:
- labels:
- app: nginx
- spec:
- affinity:
- # 节点亲和性
- nodeAffinity:
- # 指定硬性规则
- requiredDuringSchedulingIgnoredDuringExecution:
- # 匹配规则
- nodeSelectorTerms:
- - matchExpressions:
- - key: env
- operator: In
- values: ["prod"]
- - key: disk-type
- operator: In
- values: ["ssd"]
- - key: cpu-cores
- operator: Gt
- values: ["2"] # 注意:values 是字符串列表,内部会转为数字比较
- containers:
- - name: nginx
- image: nginx
- #创建
- [root@master ~/affinity]# kubectl apply -f affinity-deploy.yaml
- deployment.apps/nginx-affinity created
复制代码 查看Pod调度到哪个节点上了?
发现Pod全部调度到node01节点上- [root@master ~/affinity]# kubectl get po -o wide
- NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
- nginx-affinity-556d5d5987-5bnjm 1/1 Running 0 94s 100.117.144.175 node01 <none> <none>
- nginx-affinity-556d5d5987-bvmvh 1/1 Running 0 94s 100.117.144.173 node01 <none> <none>
- nginx-affinity-556d5d5987-d7tkg 1/1 Running 0 94s 100.117.144.174 node01 <none> <none>
- nginx-affinity-556d5d5987-frkkm 1/1 Running 0 94s 100.117.144.172 node01 <none> <none>
- nginx-affinity-556d5d5987-ttlcr 1/1 Running 0 94s 100.117.144.176 node01 <none> <none>
复制代码 如果标签不匹配,Pod会出现什么情况呢?- # 将node01节点上的标签删除一个
- [root@master ~/affinity]# kubectl label node node01 cpu-cores-
- node/node01 unlabeled
- # 查看标签
- [root@master ~/affinity]# kubectl describe node node01 | grep Labels -A 5
- Labels: beta.kubernetes.io/arch=amd64
- beta.kubernetes.io/os=linux
- disk-type=ssd
- env=prod
- kubernetes.io/arch=amd64
- kubernetes.io/hostname=node01
- # 删除Pod让其重建
- [root@master ~/affinity]# kubectl get po | awk '{print $1}' | xargs kubectl delete po
- pod "nginx-affinity-556d5d5987-5bnjm" deleted
- pod "nginx-affinity-556d5d5987-bvmvh" deleted
- pod "nginx-affinity-556d5d5987-d7tkg" deleted
- pod "nginx-affinity-556d5d5987-frkkm" deleted
- pod "nginx-affinity-556d5d5987-ttlcr" deleted
复制代码 查看Pod,发现状态都是Pending状态- [root@master ~/affinity]# kubectl get po
- NAME READY STATUS RESTARTS AGE
- nginx-affinity-556d5d5987-44qmc 0/1 Pending 0 27s
- nginx-affinity-556d5d5987-4qtth 0/1 Pending 0 27s
- nginx-affinity-556d5d5987-4tws5 0/1 Pending 0 27s
- nginx-affinity-556d5d5987-bgm7n 0/1 Pending 0 27s
- nginx-affinity-556d5d5987-kh555 0/1 Pending 0 27s
复制代码 查看一下详细信息,发现是标签不匹配- [root@master ~/affinity]# kubectl describe po nginx-affinity-556d5d5987-44qmc
- Name: nginx-affinity-556d5d5987-44qmc
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- Warning FailedScheduling 71s default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
- Warning FailedScheduling 70s default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
复制代码总结一下,节点亲和性可以更加细腻指定Pod调度到某一个节点上,如果指定的标签不匹配,那么Pod会处于Pending状态
节点亲和性实战-软性规则匹配
软性规则匹配优先满足条件,但不强制。调度器会为每个满足条件的节点打分,选择分数最高的节点。
评分机制
当存在多个满足软性规则的节点时,调度器会计算每个节点的得分:
- 基础分:所有节点初始分为 0。
- 权重叠加:对每个 preferredDuringSchedulingIgnoredDuringExecution 规则:
- 若节点满足规则,得分为 weight 值。
- 若不满足,得分为 0。
- 总分计算:节点最终得分是所有匹配规则的 weight 之和。
示例:
优先调度到以下节点:
- 首选 region=east 的节点(权重 100)
- 其次 disk-type=ssd 的节点(权重 70)
给node01节点打上region=east标签- [root@master ~/affinity]# kubectl label node node01 region=east
- node/node01 labeled
复制代码 给node02节点打上disk-type=ssd标签- [root@master ~/affinity]# kubectl label node node02 disk-type=ssd
- node/node02 labeled
复制代码 创建deploy- [root@master ~/affinity]# cat affinity-deploy.yaml
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: nginx-affinity
- spec:
- replicas: 10
- selector:
- matchLabels:
- app: nginx
- template:
- metadata:
- labels:
- app: nginx
- spec:
- affinity:
- # 节点亲和性
- nodeAffinity:
- # 指定软性匹配规则
- preferredDuringSchedulingIgnoredDuringExecution:
- # 权重范围 1-100,值越高优先级越高
- - weight: 100
- preference:
- matchExpressions:
- - key: region
- operator: In
- values: ["east"]
- - weight: 70
- preference:
- matchExpressions:
- - key: disk-type
- operator: In
- values: ["ssd"]
- containers:
- - name: nginx
- image: nginx
- [root@master ~/affinity]# kubectl apply -f affinity-deploy.yaml
- deployment.apps/nginx-affinity created
复制代码 查看Pod调度到哪一个节点上
发现调度到node01节点上的Pod居多,因为node01的权重是100,而node02节点上略少,权重为70
- [root@master ~/affinity]# kubectl get po -o wide
- NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
- nginx-affinity-65587946bd-84mv5 1/1 Running 0 30s 100.117.144.180 node01 <none> <none>
- nginx-affinity-65587946bd-8lfwl 1/1 Running 0 30s 100.117.144.183 node01 <none> <none>
- nginx-affinity-65587946bd-9fnhb 1/1 Running 0 30s 100.117.144.179 node01 <none> <none>
- nginx-affinity-65587946bd-9rqt9 1/1 Running 0 30s 100.117.144.177 node01 <none> <none>
- nginx-affinity-65587946bd-gsq4c 1/1 Running 0 30s 100.117.144.182 node01 <none> <none>
- nginx-affinity-65587946bd-pf845 1/1 Running 0 30s 100.95.185.238 node02 <none> <none>
- nginx-affinity-65587946bd-pvwps 1/1 Running 0 30s 100.95.185.237 node02 <none> <none>
- nginx-affinity-65587946bd-qhhh7 1/1 Running 0 30s 100.95.185.239 node02 <none> <none>
- nginx-affinity-65587946bd-tn54h 1/1 Running 0 30s 100.117.144.178 node01 <none> <none>
- nginx-affinity-65587946bd-x64qs 1/1 Running 0 30s 100.117.144.181 node01 <none> <none>
复制代码 节点亲和性-混合使用硬性和软性规则
示例:- apiVersion: v1
- kind: Pod
- metadata:
- name: mixed-affinity-pod
- spec:
- affinity:
- nodeAffinity:
- # 硬性限制
- requiredDuringSchedulingIgnoredDuringExecution:
- nodeSelectorTerms:
- - matchExpressions:
- - key: env
- operator: In
- values: ["prod"]
- # 软性限制
- preferredDuringSchedulingIgnoredDuringExecution:
- - weight: 80
- preference:
- matchExpressions:
- - key: gpu
- operator: Exists
- containers:
- - name: nginx
- image: nginx
复制代码 nodeName、nodeSelector和节点亲和性的区别
特性nodeNamenodeSelector节点亲和性(Node Affinity)调度方式直接指定节点名称通过标签匹配节点通过灵活的规则表达式匹配节点匹配规则精确匹配节点名称精确匹配标签键值对支持多种操作符(In, NotIn, Exists, Gt, Lt等)策略类型硬约束(必须满足)硬约束(必须满足)支持硬约束(requiredDuringScheduling)和软约束(preferredDuringScheduling)灵活性最低(需手动指定节点)中等(仅支持精确匹配)最高(支持复杂逻辑和优先级)语法复杂度最简单(单字段配置)较低(标签键值对)较高(嵌套表达式)应用场景调试、特殊需求简单的标签选择复杂的调度策略(如跨区域部署、资源优化)
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |