cni_podip地址分配过程

One of the core requirements of the Kubernetes networking model is that every pod should get its own IP address and that every pod in the cluster should be able to talk to it using this IP address. There are several network providers (flannel, calico, canal, etc.) that implement this networking model.

Kubernetes 网络模型的核心要求之一是,每个 pod 都应该有自己的 IP 地址,并且集群中的每个 pod 都应该能够使用此 IP 地址与它通信。有几个网络提供商(flannel,calico,canal等)实现了这种网络模型。

As I started working on Kubernetes, it wasn’t completely clear to me how every pod is assigned an IP address. I understood how various components worked independently, however, it wasn’t clear how these components fit together. For instance, I understood what CNI plugins were, however, I didn’t know how they were invoked. So, I wanted to write this post to share what I have learned about various networking components and how they are stitched together in a kubernetes cluster for every pod to receive an IP address.

当我开始在 Kubernetes 上工作时,我并不完全清楚每个 pod 是如何分配一个 IP 地址的。我了解各种组件如何独立工作,但是,尚不清楚这些组件如何组合在一起。例如,我了解CNI插件是什么,但是,我不知道它们是如何被调用的。所以,我想写这篇文章来分享我对各种网络组件的了解,以及它们如何在kubernetes集群中拼接在一起,以便每个pod接收IP地址。

There are various ways of setting up networking in kubernetes and various options for a container runtime. For this post, I will use Flannel as the network provider and Containerd as the container runtime. Also, I am going to assume that you know how container networking works and only share a very brief overview below for context.

在 kubernetes 中设置网络的方法多种多样,容器运行时也有各种选项。在这篇文章中,我将使用Flannel作为网络提供商,使用Containerd作为容器运行时。另外,我将假设您知道容器网络的工作原理,并且仅在下面分享一个非常简短的概述以获取上下文。

Container Networking: A Very Brief Overview

容器网络:非常简要的概述

There are some really good posts explaining how container networking works. For context, I will go over a very high level overview here with a single approach that involves linux bridge networking and packet encapsulation. I am skipping details here as container networking deserves a blog post of itself. Some of the posts that I have found to be very educational in this space are linked in the references below.

有一些非常好的文章解释了容器网络的工作原理。对于上下文,我将在这里使用涉及 linux 桥接网络和数据包封装的单一方法进行非常高级的概述。我在这里跳过了细节,因为容器网络本身就值得写一篇博客文章。我发现在这个领域非常有教育意义的一些帖子在下面的参考资料中链接。

Containers on the same host

同一主机上的容器

One of the ways containers running on the same host can talk to each other via their IP addresses is through a linux bridge. In the kubernetes (and docker) world, a veth (virtual ethernet) device is created to achieve this. One end of this veth device is inserted into the container network namespace and the other end is connected to a linux bridge on the host network. All containers on the same host have one end of this veth pair connected to the linux bridge and they can talk to each other using their IP addresses via the bridge. The linux bridge is also assigned an IP address and it acts as a gateway for egress traffic from pods destined to different nodes.

在同一主机上运行的容器可以通过其IP地址相互通信的方式之一是通过linux桥接。在 kubernetes(和 docker)世界中,创建了一个 veth(虚拟以太网)设备来实现这一点。此 veth 设备的一端插入到容器网络命名空间中,另一端连接到主机网络上的 linux 网桥。同一主机上的所有容器都有连接到 linux 网桥的 veth 对的一端,它们可以通过网桥使用其 IP 地址相互通信。Linux 网桥还分配了一个 IP 地址,它充当从发往不同节点的 pod 的出口流量的网关。

Containers on different hosts

不同主机上的容器

One of the ways containers running on different hosts can talk to each other via their IP addresses is by using packet encapsulation. Flannel supports this through vxlan which wraps the original packet inside a UDP packet and sends it to the destination.

在不同主机上运行的容器可以通过其 IP 地址相互通信的一种方式是使用数据包封装。Flannel 通过 vxlan 支持此功能,vxlan 将原始数据包包装在 UDP 数据包中并将其发送到目的地。

In a kubernetes cluster, flannel creates a vxlan device and some route table entries on each of the nodes. Every packet that’s destined for a container on a different host goes through the vxlan device and is encapsulated in a UDP packet. On the destination, the encapsulated packet is retrieved and the packet is routed through to the destined pod.

在 kubernetes 集群中,flannel 在每个节点上创建一个 vxlan 设备和一些路由表条目。发往不同主机上的容器的每个数据包都通过 vxlan 设备并封装在 UDP 数据包中。在目标上,将检索封装的数据包,并将数据包路由到目标 Pod。

NOTE: This is just one of the ways how networking between containers can be configured.

注意:这只是配置容器之间网络的方法之一。

What Is CRI?

CRI (Container Runtime Interface) is a plugin interface that allows kubelet to use different container runtimes. Various container runtimes implement the CRI API and this allows users to use the container runtime of their choice in their kubernetes installation.

CRI(容器运行时接口)是一个插件接口,允许 kubelet 使用不同的容器运行时。各种容器运行时实现了 CRI API,这允许用户在其 kubernetes 安装中使用他们选择的容器运行时。

What is CNI?

CNI project includes a spec to provide a generic plugin-based networking solution for linux containers. It also consists of various plugins which perform different functions in configuring the pod network. A CNI plugin is an executable that follows the CNI spec and we’ll discuss some plugins in the post below.

CNI 项目包括一个规范,用于为 linux 容器提供基于插件的通用网络解决方案。它还由各种插件组成,这些插件在配置pod网络时执行不同的功能。CNI 插件是遵循 CNI 规范的可执行文件,我们将在下面的帖子中讨论一些插件。

Assigning Subnets To Nodes For Pod IP Addresses为容器 IP 地址的节点分配子网

If all pods are required to have an IP address, it’s important to ensure that all pods across the entire cluster have a unique IP address. This is achieved by assigning each node a unique subnet from which pods are assigned IP addresses on that node.

如果所有 Pod 都必须具有 IP 地址,请务必确保整个集群中的所有 Pod 都具有唯一的 IP 地址。这是通过为每个节点分配一个唯一的子网来实现的,从该子网中为 Pod 分配了该节点上的 IP 地址。

Node IPAM Controller

When nodeipam is passed as an option to the kube-controller-manager’s --controllers command line flag, it allocates each node a dedicated subnet (podCIDR) from the cluster CIDR (IP range for the cluster network). Since these podCIDRs are disjoint subnets, it allows assigning each pod a unique IP address.

当 nodeipam 作为选项传递给 kube 控制器管理器的 --controllers 命令行标志时,它会从集群 CIDR(集群网络的 IP 范围)为每个节点分配一个专用子网 (podCIDR)。由于这些 podCIDR 是不相交的子网,因此它允许为每个 Pod 分配一个唯一的 IP 地址。

[root@k8s-master01 ~]# kubectl get noNAME STATUS ROLESAGE VERSIONk8s-master01 Readycontrol-plane,master 3dv1.23.5k8s-node01 Ready<none> 2d22h v1.23.5k8s-node02 Ready<none> 2d22h v1.23.5[root@k8s-master01 ~]# kubectl get no k8s-node01-o json | jq .spec.podCIDR"10.244.2.0/24"[root@k8s-master01 ~]# kubectl get no k8s-node02-o json | jq .spec.podCIDR"10.244.1.0/24"[root@k8s-master01 ~]# kubectl get no k8s-master01-o json | jq .spec.podCIDR"10.244.0.0/24"[root@k8s-master01 ~]#

Kubelet, Container Runtime and CNI Plugins - how it’s all stitched together

When a pod is scheduled on a node, a lot of things happen to start up a pod. In this section, I’ll only focus on the interactions that relate to configuring network for the pod.

当在节点上调度 Pod 时,启动 Pod 会发生很多事情。在本节中,我将仅重点介绍与为 Pod 配置网络相关的交互。

Once a pod is scheduled on the node, the following interactions result in configuring the network and starting the application container.

在节点上调度 Pod 后,以下交互将导致配置网络并启动应用程序容器。

Interactions between Container Runtime and CNI Plugins

容器运行时和 CNI 插件之间的交互

Every network provider has a CNI plugin which is invoked by the container runtime to configure network for a pod as it’s started. With containerd as the container runtime, Containerd CRI plugin invokes the CNI plugin. Every network provider also has an agent that’s installed on each of the kubernetes node to configure pod networking. When the network provider agent is installed, it either ships with the CNI config or it creates one on the node which is then used by the CRI plugin to figure out which CNI plugin to call.

每个网络提供商都有一个 CNI 插件,容器运行时调用该插件,以便在 Pod 启动时为其配置网络。容器化作为容器运行时,容器化 CRI 插件调用 CNI 插件。每个网络提供商都有一个安装在每个 kubernetes 节点上的代理,用于配置 pod 网络。安装网络提供程序代理后,它要么附带 CNI 配置,要么在节点上创建一个,然后 CRI 插件使用它来确定要调用的 CNI 插件。

The location for the CNI config file is configurable and the default value is /etc/cni/net.d/<config-file>. CNI plugins need to be shipped on every node by the cluster administrators. The location for CNI plugins is configurable as well and the default value is /opt/cni/bin.

CNI 配置文件的位置是可配置的,默认值为 /etc/cni/net.d/<config-file>。CNI 插件需要由群集管理员在每个节点上提供。CNI 插件的位置也是可配置的,默认值为 /opt/cni/bin。

In case of containerd as the container runtime, path for CNI configuration and CNI plugin binaries can be specified under [plugins."io.containerd.grpc.v1.cri".cni] section of the containerd config.

如果容器作为容器运行时,可以在[插件]下指定CNI配置和CNI插件二进制文件的路径。io.containerd.grpc.v1.cri".cni] 部分的 containerd config.

Since we are referring to Flannel as the network provider here, I’ll talk a little about how Flannel is set up. Flanneld is the Flannel daemon and is typically installed on a kubernetes cluster as a daemonset with install-cni as an init container. The install-cni container creates the CNI configuration file - /etc/cni/net.d/10-flannel.conflist - on each node. Flanneld creates a vxlan device, fetches networking metadata from the apiserver and watches for updates on pods. As pods are created, it distributes routes for all pods across the entire cluster and these routes allow pods to connect to each other via their IP addresses. For details on how flannel works, I recommend the linked references below.

由于我们在这里将Flannel称为网络提供商,因此我将讨论一下Flannel是如何设置的。Flanneld 是 Flannel 守护程序,通常作为守护程序集安装在 kubernetes 集群上,而 install-cni 作为初始化容器。install-cni 容器在每个节点上创建 CNI 配置文件 - /etc/cni/net.d/10-flannel.conflist。Flanneld 创建一个 vxlan 设备,从 apiserver 获取网络元数据,并监视 Pod 上的更新。创建 Pod 时,它会在整个集群中为所有 Pod 分配路由,这些路由允许 Pod 通过其 IP 地址相互连接。有关法兰绒如何工作的详细信息,我建议使用以下链接的参考资料。

The interactions between Containerd CRI Plugin and CNI plugins can be visualized as follows:

容器化 CRI 插件和 CNI 插件之间的交互可以可视化如下:

As described above, kubelet calls the Containerd CRI plugin in order to create a pod and Containerd CRI plugin calls the CNI plugin to configure network for the pod. The network provider CNI plugin calls other base CNI plugins to configure the network. The interactions between CNI plugins are described below.

如上所述,kubelet 调用 Containerd CRI 插件以创建 Pod,Containerd CRI 插件调用 CNI 插件为 Pod 配置网络。网络提供程序 CNI 插件调用其他基本 CNI 插件来配置网络。CNI 插件之间的交互如下所述。

Interactions Between CNI Plugins CNI 插件之间的交互

There are various CNI plugins that help configure networking between containers on a host. For this post, we will refer to 3 plugins.

有各种 CNI 插件可帮助配置主机上容器之间的网络。对于这篇文章,我们将参考3个插件。

Flannel CNI Plugin

When using Flannel as the network provider, the Containerd CRI plugin invokes the Flannel CNI plugin using the CNI configuration file - /etc/cni/net.d/10-flannel.conflist.

当使用 Flannel 作为网络提供程序时,Containerd CRI 插件使用 CNI 配置文件 - /etc/cni/net.d/10-flannel.conflist 调用 Flannel CNI 插件。

$ cat /etc/cni/net.d/10-flannel.conflist{"name": "cni0","plugins": [{"type": "flannel","delegate": { "ipMasq": false,"hairpinMode": true,"isDefaultGateway": true}}]}

The Fannel CNI plugin works in conjunction with Flanneld. When Flanneld starts up, it fetches the podCIDR and other network related details from the apiserver and stores them in a file - /run/flannel/subnet.env.

Fannel CNI 插件与 Flanneld 配合使用。当Flanneld启动时,它会从apiserver获取podCIDR和其他与网络相关的详细信息,并将它们存储在一个文件中 - /run/flannel/subnet.env。

FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.0.1/24FLANNEL_MTU=1450 FLANNEL_IPMASQ=false

The Flannel CNI plugin uses the information in /run/flannel/subnet.env to configure and invoke the bridge CNI plugin.

Flannel CNI 插件使用 /run/flannel/subnet.env 中的信息来配置和调用网桥 CNI 插件。

Bridge CNI Plugin

Flannel CNI plugin calls the Bridge CNI plugin with the following configuration:

Flannel CNI 插件使用以下配置调用 Bridge CNI 插件:

{"name": "cni0","type": "bridge","mtu": 1450,"ipMasq": false,"isGateway": true,"ipam": {"type": "host-local","subnet": "10.244.0.0/24"}}

When Bridge CNI plugin is invoked for the first time, it creates a linux bridge with the "name": "cni0" specified in the config file. For every pod, it then creates a veth pair - one end of the pair is in the container’s network namespace and the other end is connected to the linux bridge on the host network. With Bridge CNI plugin, all containers on a host are connected to the linux bridge on the host network.

当 Bridge CNI 插件首次被调用时,它会创建一个 linux 桥接,并在配置文件中指定的“name”:“cni0”。然后,对于每个 pod,它会创建一个 veth 对 - 该对的一端位于容器的网络命名空间中,另一端连接到主机网络上的 linux 桥接。使用 Bridge CNI 插件,主机上的所有容器都连接到主机网络上的 linux 网桥。

After configuring the veth pair, Bridge plugin invokes the host-local IPAM CNI plugin. Which IPAM plugin to use can be configured in the CNI config CRI plugin uses to call the flannel CNI plugin.

配置 veth 对后,Bridge 插件将调用主机本地 IPAM CNI 插件。要使用的 IPAM 插件可以在 CRI 插件用于调用flannel CNI 插件的 CNI 配置中进行配置。

Host-local IPAM CNI plugins

The Bridge CNI plugin calls the host-local IPAM CNI plugin with the following configuration:

网桥 CNI 插件使用以下配置调用主机本地 IPAM CNI 插件:

{"name": "cni0","ipam": {"type": "host-local","subnet": "10.244.0.0/24","dataDir": "/var/lib/cni/networks"}}

Host-local IPAM (IP Address Management) plugin returns an IP address for the container from the subnet and stores the allocated IP locally on the host under the directory specified under 

dataDir - 

/var/lib/cni/networks/<network-name=cni0>/<ip>. 

/var/lib/cni/networks/<network-name=cni0>/<ip> 

file contains the container ID to which the IP is assigned.

主机本地 IPAM(IP 地址管理)插件从子网返回容器的 IP 地址,并将分配的 IP 本地存储在主机上 dataDir - /var/lib/cni/networks/<network-name=cni0>/<ip>. /var/lib/cni/networks/<network-name=cni0>/<ip> 文件下指定的目录下。

When invoked, the host-local IPAM plugin returns the following payload

调用时,主机本地 IPAM 插件将返回以下有效负载

{"ip4": {"ip": "10.244.4.2","gateway": "10.244.4.3"},"dns": {}}

Summary总结

Kube-controller-manager assigns a podCIDR to each node. Pods on a node are assigned an IP address from the subnet value in podCIDR. Because podCIDRs across all nodes are disjoint subnets, it allows assigning each pod a unique IP address.

Kube-controller-manager 为每个节点分配一个 podCIDR。节点上的 Pod 会根据 podCIDR 中的子网值分配一个 IP 地址。由于所有节点上的 podCIDR 都是不相交的子网,因此它允许为每个 Pod 分配一个唯一的 IP 地址。

Kubernetes cluster administrator configures and installs kubelet, container runtime, network provider agent and distributes CNI plugins on each node. When network provider agent starts, it generates a CNI config. When a pod is scheduled on a node, kubelet calls the CRI plugin to create the pod. In containerd’s case, Containerd CRI plugin then calls the CNI plugin specified in the CNI config to configure the pod network. And all of this results in a pod getting an IP address.

Kubernetes 集群管理员配置和安装 kubelet、容器运行时、网络提供程序代理,并在每个节点上分发 CNI 插件。当网络提供程序代理启动时,它会生成 CNI 配置。当一个 pod 被调度到一个节点上时,kubelet 会调用 CRI 插件来创建 pod。在 containerd 的情况下,Containerd CRI 插件然后调用 CNI 配置中指定的 CNI 插件来配置 pod 网络。所有这些都会导致 Pod 获得 IP 地址。

参考

https://jvns.ca/blog/2016/12/22/container-networking/

https://msazure.club/flannel-networking-demystify/

https://medium.com/@anilkreddyr/kubernetes-with-flannel-understanding-the-networking-part-2-78b53e5364c7

https://medium.com/@anilkreddyr/kubernetes-with-flannel-understanding-the-networking-part-1-7e1fe51820e4

https://mooon.top/2019/03/08/blog/Calico%20IP%20%E5%88%86%E9%85%8D%E7%AD%96%E7%95%A5/

https://ronaknathani.com/blog/2020/08/how-a-kubernetes-pod-gets-an-ip-address/

https://arthurchiao.art/blog/what-happens-when-k8s-creates-pods-5-zh/#64-cni-%E5%90%8E%E5%8D%8A%E9%83%A8%E5%88%86cni-plugin-%E5%AE%9E%E7%8E%B0