Skip to main content

Kubernetes – init containers, CNI and more

Certain questions about Kubernetes seem to come up again and again:

  • What’s up with this init container stuff?
  • What’s a CNI plugin?
  • Why is Kubernetes complaining about pods not finishing initialisation?

Kubernetes is a complex system with a simple overall purpose: run user workloads in a way that permits the authors of the workloads to not care (much) about the messy details of the hardware underneath. The workload authors are supposed to be able to just focus on Pods and Services; in turn, Kubernetes is meant to arrange things such that workloads get mapped to Pods, Pods get deployed on Nodes, and the network in between looks flat and transparent.

This is simple to state, but extremely complex to implement in practice. (This is an area where Kubernetes is doing a great job of making things complex for the Kubernetes implementors so that they can be easier for the users – nicely done!) Under the hood, Kubernetes is leaning heavily on a number of technologies to make all this happen.

Clusters and cgroups and Pods...

The first major area that Kubernetes has to manage is actually running the workloads within the cluster. It relies heavily on OS-level isolation mechanisms for this:

  • Clusters are composed of one of more Nodes, which are (possibly virtualised) machines. For this article, we’ll be talking about Linux Nodes.

  • Since different Nodes are different machines (virtual or physical), everything on one Node is isolated from all other Nodes.

  • Pods are composed of one or more containers, all of which are isolated from one another within the same Node using Linux cgroups and namespaces.

  • It’s worth noting that Linux itself runs at the Node level. Pods and containers don’t have distinct copies of the operating system, which is why isolation between them is such a big deal.

This multi-layer approach gives Kubernetes a way to orchestrate which workloads run where within the cluster, and to keep track of resource availability and consumption: workload containers are mapped to Pods, Pods are scheduled onto Nodes, and Nodes are all connected to a network.

Deployments, ReplicaSets, DaemonSets, etc., are all bookkeeping mechanisms for figuring out exactly which Pods gets scheduled onto which Nodes, but the fundamental scheduling mechanism is the same across all of them.

Kubernetes Networking

The other major area that Kubernetes has to manage is the network. Kubernetes requires that Pods see a network that is flat and transparent: every Pod must be able to directly communicate with every other Pod, whether on the same Node or not. This implies that each Pod has to have its own IP address, which I’ll call a Pod IP in a fit of originality.

(Technically, any container within a Pod must be able to talk to containers in other Pods – but these IP addresses exist at the Pod level, not the container level. Multiple containers in one Pod share the same IP address.)

You could write a workload to use Pod IPs directly to talk to other workloads, but it’s not a good idea: Pod IPs change as Pods go up and down. Instead, we generally refer to workloads using a Kubernetes Service. Services are actually fairly complex (even though I’m glossing over headless Services and such here!):

  • A Service causes a DNS entry to be allocated, so that workloads can refer to the Service using a name.

  • The Service also allocates a Cluster IP address for the Service, which refers only to this Service and is distinct from any other IP address in the cluster. (I’m calling it a Cluster IP to reinforce the idea that it is not tied to a single Pod.)

  • The Service also defines a selector, which defines which Pods will be matched with the Service.

  • Finally, the Service collects the Pod IP addresses of all the Pods it matches, and keeps track of them as its endpoints.

When a workload tries to connect to the Cluster IP belonging to a Service, by default Kubernetes will pick one of the Service’s endpoints, and route the connection there (remember that the Service endpoints are Pod IP addresses). In this way, Services do simple load balancing at the connection level.

It should be fairly apparent from all this that there is a lot of networking magic happening in Kubernetes. This is all handled by the low-level firewall built into the Linux kernel.


IPTables

The Linux kernel contains a fairly powerful mechanism to examine network traffic at the packet level and make decisions about what to do with each packet. This might involve letting the packet continue on unchanged, altering the packet, redirecting the packet, or even dropping the packet entirely. I’m going to refer to the whole of this mechanism as IPTables – technically, that’s the name of an early implementation, but we tend to use it to refer to the whole mechanism around Buoyant, so I’ll stick with it. (And if you think this sounds like something you might do with eBPF, you’re right! This is an area where eBPF shines, although many implementations of this mechanism predate eBPF and don’t depend on it.)

What this means is that Kubernetes can - and does - use IPTables to handle the complex, dynamic routing that it needs for network traffic within the cluster. For example, IPTables can catch a packet being sent to a Service’s Cluster IP address and rewrite it to instead go to a specific Pod IP address; likewise, it can know the difference between a Pod IP address on the same Node as the sender and a Pod IP address on a different Node, and manage routing to make everything work out. In turn, this requires Kubernetes to constantly update the IPTables rules as Pods are created and destroyed.

The Container Networking Interface

The specific way that a given Kubernetes implementation needs to update the network configuration depends on the details of the implementation. The Kubernetes Container Networking Interface, or CNI, is a standard that tries to provide a uniform interface for implementors to request network-configuration changes, to make Kubernetes easier to port.

A critical aspect of the CNI is that it allows for CNI plugins, which - in turn - can permit swapping out the network layer even while keeping the rest of the Kubernetes implementation the same. For example, k3d uses Flannel as its networking layer by default, but it’s easy to switch it to use Calico instead.

Kubernetes Pod Startup

Putting it all together, here’s how Kubernetes handles things when a Pod starts.

  1. Find a Node to run the new Pod.
  2. Execute any CNI plugins defined by the Node, in the context of the new Pod. Fail if any don’t work.
  3. Execute any init containers defined for the new Pod, in order. Fail if any don’t work.
  4. Start all the containers defined by the Pod.

When starting the Pod’s containers, it’s important to note that they will be started in the order defined by the Pod’s spec, but that normally Kubernetes will not wait for a given container before proceeding to the next container. However, if the container defines a postStartHook, Kubernetes will start the container, then run the postStartHook to completion, before starting the next container.


Comments

Popular posts from this blog

What Why How SDN..???????

What is SDN?   If you follow any number of news feeds or vendor accounts on Twitter, you've no doubt noticed the term "software-defined networking" or SDN popping up more and more lately. Depending on whom you believe, SDN is either the most important industry revolution since Ethernet or merely the latest marketing buzzword (the truth, of course, probably falls somewhere in between). Few people from either camp, however, take the time to explain what SDN actually means. This is chiefly because the term is so new and different parties have been stretching it to encompass varying definitions which serve their own agendas. The phrase "software-defined networking" only became popular over roughly the past eighteen months or so. So what the hell is it? Before we can appreciate the concept of SDN, we must first examine how current networks function. Each of the many processes of a router or switch can be assigned to one of three conceptual planes of operatio...

NETWORKING BASICS

This article is referred from windowsnetworking.com In this article series, I will start with the absolute basics, and work toward building a functional network. In this article I will begin by discussing some of the various networking components and what they do. If you would like to read the other parts in this article series please go to: Networking Basics: Part 2 - Routers Networking Basics: Part 3 - DNS Servers Networking Basics: Part 4 - Workstations and Servers Networking Basics: Part 5 - Domain Controllers Networking Basics: Part 6 - Windows Domain Networking Basics: Part 7 - Introduction to FSMO Roles Networking Basics: Part 8 - FSMO Roles continued Networking Basics: Part 9 – Active Directory Information Networking Basics: Part 10 - Distinguished Names Networking Basics, Part 11: The Active Directory Users and Computers Console Networking Basics: Part 12 - User Account Management Networking Basics: Part 13 - Creating ...

How to install and setup Docker on RHEL 7/CentOS 7

H ow do I install and setup Docker container on an RHEL 7 (Red Hat Enterprise Linux) server? How can I setup Docker on a CentOS 7? How to install and use Docker CE on a CentOS Linux 7 server? Docker is free and open-source software. It automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere. Typically you develop software on your laptop/desktop. You can build a container with your app, and it can test run on your computer. It will scale in cloud, VM, VPS, bare-metal and more. There are two versions of docker. The first one bundled with RHEL/CentOS 7 distro and can be installed with the yum. The second version distributed by the Docker project called docker-ce (community free version) and can be installed by the official Docker project repo. The third version distributed by the Docker project called docker-ee (Enterprise paid version) and can be installed by the official Docker project repo.  This page shows...