绑定完请刷新页面
取消
刷新

分享好友

×
取消 复制
基于 K8S 的云原生 PaaS 平台基础架构 - Kubernetes 网络之 Service Topology 拓扑感知服务路由
2023-07-04 09:37:47
Erda 开源地址: github.com/erda-project

Erda cloud 官网:Erda Cloud

简介

我们知道再k8s中,Service用来对一组pod做负载均衡,默认情况下访问Service,流量会均衡的被调度到后端的pod中,但是再实际的业务场景中,节点可能是分散在不同的地方,比如多AZ场景下的部署,边缘场景下的部署,我们更希望访问距离自己近的Pod,要求是在云上,由于跨AZ的流量是收费的,所以将流量在同一个AZ中闭环是一个合理的需求,在k8s 1.17版本之后引入了 Service Topology的特性,用于实现Service级别的拓扑感知。

实现原理

Service的实际实现原理是kube-proxy监听service对应的endpoint资源,然后通过ipvs或者iptables去节点上设置对应的负载均衡策略,如果要让节点根据自己的拓扑来决定请求发到哪里,肯定需要监听的这个endpoint带上拓扑信息,然后跟本节点的node信息做一个对比,然后来设置差异化的负载均衡策略。为此,1.17版本引入了EndpointSlice的资源,用于描述service的拓扑,如下是一个例子。

可以看到改对象会给每一endpoint打上相应的拓扑信息,剩下的事情就交给kube-proxy,去过滤出自己需要的endpint信息。

addressType: IPv4
apiVersion: discovery.k8s.io/v1beta1
endpoints:
- addresses:
  - 10.244.2.2
  conditions:
    ready: true
  targetRef:
    kind: Pod
    name: ud-test-beijing-btwbn-78db8b7786-58wmx
    namespace: default
    resourceVersion: "3112"
    uid: 29b6d9c1-4e31-4290-9136-604038ec2b36
  topology:
    kubernetes.io/hostname: kind-worker2
    topology.kubernetes.io/zone: beijing
- addresses:
  - 10.244.1.2
  conditions:
    ready: true
  targetRef:
    kind: Pod
    name: ud-test-hangzhou-btwbn-5c7b5f545d-mggbd
    namespace: default
    resourceVersion: "3047"
    uid: 35391bde-a948-4625-a692-891a0cd3a353
  topology:
    kubernetes.io/hostname: kind-worker
    topology.kubernetes.io/zone: hangzhou
kind: EndpointSlice

约束条件

  • k8s版本再1.17及其以上
  • kube-proxy以ipvs或者iptables模式来运行
  • k8s组件开启了- --feature-gates=ServiceTopology=true,EndpointSlice=true的特性
  • 目前仅支持如下的拓扑标签

kubernetes.io/hostname

topology.kubernetes.io/

topology.kubernetes.io/region

使用实例

我们创建两个deploment,一个带topologykey的Service,然后来观察创建的ipvs规则是否是按照我们约定的拓扑结构来创建的。

创建deployment

beijing_deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: ud-test
    apps.openyurt.io/pool-name: beijing
  name: ud-test-beijing-btwbn
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      app: ud-test
      apps.openyurt.io/pool-name: beijing
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ud-test
        apps.openyurt.io/pool-name: beijing
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: apps.openyurt.io/nodepool
                operator: In
                values:
                - beijing
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/dice-third-party/nginx:1.14.0
          imagePullPolicy: Always
          name: nginx
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

hangzhou_deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: ud-test
    apps.openyurt.io/pool-name: hangzhou
  name: ud-test-hangzhou-btwbn
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      app: ud-test
      apps.openyurt.io/pool-name: hangzhou
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ud-test
        apps.openyurt.io/pool-name: hangzhou
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: apps.openyurt.io/nodepool
                operator: In
                values:
                - hangzhou
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/dice-third-party/nginx:1.14.0
          imagePullPolicy: Always
          name: nginx
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

创建service

增加了topoloKeys的字段,用于给kube-proxy做拓扑感知使用,kube-proxy会根据自身的该标签来过滤endpointslice中的endpoin信息设置自己的负载均衡策略,所以相应的节点上也需要打上topology.kubernetes.io/的标签。

apiVersion: v1
kind: Service
metadata:
  name: ud-test
  labels:
    app: ud-test
spec:
  ports:
  - name: ud-test
    port: 80
    targetPort: 80
  selector:
    app: ud-test
  topologyKeys:
    - "topology.kubernetes.io/zone"
    # - "*"

观察效果

创建完成后,我们们可以看到一个service资源,对应两个pod资源,同时会生成一个EndpointSlice资源。

然后我们登陆到有topology.kubernetes.io/节点上的去查看对应的ipvs规则

可以看到ipvs规则对应的后端pod的ip只有对应节点的标签过滤出来的endpoint信息

如果节点没有打topology.kubernetes.io/呢,那就不会有任何后端信息。这也不合理,所有k8s给了一个*选项,如果没有匹配到标签,则后端是所有的endpoint集合。

附Service Topology代码实现

// FilterTopologyEndpoint returns the appropriate endpoints based on the cluster
// topology.
// This uses the current node's labels, which contain topology information, and
// the required topologyKeys to find appropriate endpoints. If both the endpoint's
// topology and the current node have matching values for topologyKeys[0], the
// endpoint will be chosen.  If no endpoints are chosen, toplogyKeys[1] will be
// considered, and so on.  If either the node or the endpoint do not have values
// for a key, it is considered to not match.
//
// If topologyKeys is specified, but no endpoints are chosen for any key, the
// the service has no viable endpoints for clients on this node, and connections
// should fail.
//
// The special key "*" may be used as the last entry in topologyKeys to indicate
// "any endpoint" is acceptable.
//
// If topologyKeys is not specified or empty, no topology constraints will be
// applied and this will return all endpoints.
func FilterTopologyEndpoint(nodeLabels map[string]string, topologyKeys []string, endpoints []Endpoint) []Endpoint {
	// Do not filter endpoints if service has no topology keys.
	if len(topologyKeys) ==  {
		return endpoints
	}

	filteredEndpoint := []Endpoint{}

	if len(nodeLabels) ==  {
		if topologyKeys[len(topologyKeys)-1] == v1.TopologyKeyAny {
			// edge case: include all endpoints if topology key "Any" specified
			// when we cannot determine current node's topology.
			return endpoints
		}
		// edge case: do not include any endpoints if topology key "Any" is
		// not specified when we cannot determine current node's topology.
		return filteredEndpoint
	}

	for _, key := range topologyKeys {
		if key == v1.TopologyKeyAny {
			return endpoints
		}
		topologyValue, found := nodeLabels[key]
		if !found {
			continue
		}

		for _, ep := range endpoints {
			topology := ep.GetTopology()
			if value, found := topology[key]; found && value == topologyValue {
				filteredEndpoint = append(filteredEndpoint, ep)
			}
		}
		if len(filteredEndpoint) >  {
			return filteredEndpoint
		}
	}
	return filteredEndpoint
}


分享好友

分享这个小栈给你的朋友们,一起进步吧。

K8s
创建时间:2020-05-14 13:51:19
K8s技术分享
展开
订阅须知

• 所有用户可根据关注领域订阅专区或所有专区

• 付费订阅:虚拟交易,一经交易不退款;若特殊情况,可3日内客服咨询

• 专区发布评论属默认订阅所评论专区(除付费小栈外)

栈主、嘉宾

查看更多
  • 飘絮絮絮丶
    栈主
  • gaokeke123
    嘉宾
  • LCR_
    嘉宾

小栈成员

查看更多
  • at_1
  • zhengchengming
  • chenglinjava0501
  • hansen_hello2019
戳我,来吐槽~