Kubernetes Kubelet 相关知识整理

概述

Kubelet是运行在每个节点上面，接收Api server 发来的pod修改/新增/删除等操作，并且监听节点状态、pod的状态、管理镜像等等。

按照上面描述，那我们看一下节点上kubelet监听的端口。

netstat -anp | grep kubelet

可以看到kubelet是监听了10250和10248端口。

图解

下面图片展示了Kubelet管理的对象。

ProbeManager

介绍摘自官网配置存活、就绪和启动探针 | Kubernetes

kubelet 使用存活探针来确定什么时候要重启容器。例如，存活探针可以探测到应用死锁（应用程序在运行，但是无法继续执行后面的步骤）情况。重启这种状态下的容器有助于提高应用的可用性，即使其中存在缺陷。

kubelet 使用就绪探针可以知道容器何时准备好接受请求流量，当一个 Pod 内的所有容器都就绪时，才能认为该 Pod 就绪。这种信号的一个用途就是控制哪个 Pod 作为 Service 的后端。若 Pod 尚未就绪，会被从 Service 的负载均衡器中剔除。

kubelet 使用启动探针来了解应用容器何时启动。如果配置了这类探针，你就可以控制容器在启动成功后再进行存活性和就绪态检查，确保这些存活、就绪探针不会影响应用的启动。启动探针可以用于对慢启动容器进行存活性检测，避免它们在启动运行之前就被杀掉。

1、通过HTTP方式做健康探测

2、通过exec方式做健康探测

3、通过TCP方式做探针

4、定义 gRPC 存活探针 1.24beta版新增

下面演示用HTTP方式

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: registry.k8s.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: Custom-Header
          value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 5

在这个配置文件中，你可以看到 Pod 也只有一个容器。 periodSeconds 字段指定了 kubelet 每隔 3 秒执行一次存活探测。 initialDelaySeconds 字段告诉 kubelet 在执行第一次探测前应该等待 3 秒。 kubelet 会向容器内运行的服务（服务在监听 8080 端口）发送一个 HTTP GET 请求来执行探测。如果服务器上 /healthz 路径下的处理程序返回成功代码，则 kubelet 认为容器是健康存活的。如果处理程序返回失败代码，则 kubelet 会杀死这个容器并将其重启。

这里就不做详细介绍，官网的用法讲的很清晰。

OOMWatcher

从cadvisor监听事件，如果出现了oom则记录一个event事件。对于容器的oom状态，k8s使用docker的状态State.OOMKilled判断是否经历oom事件。

cAdvisor

集成在 Kubelet 中的容器监控工具，用于收集本节点和容器的监控信息。

如下图所示，我们可以查看一下Kubernetes源码，cAdvisor是google的包。

Promethues里的metrics也是cAdvisor收集的数据。

DiskSpaceManager

监听节点的硬盘容量

StatusManager

StatusManager 的主要功能是将 pod 状态信息同步到 apiserver，statusManage 并不会主动监控 pod 的状态，而是提供接口供其他 manager 进行调用。比如 probeManager。probeManager 会定时去监控 pod 中容器的健康状况，一旦发现状态发生变化，就调用 statusManager 提供的方法更新 pod 的状态。

EvictionManager

EvictionManager 会监控资源的使用情况（Memory、Disk、PID），对Pod进行驱逐。

具体规则可参考官网节点压力驱逐 | Kubernetes

VolumnManager

存储卷挂载，查找PV和PVC对象，或者通过storageClass进行关联。volumeManager的创建发生在创建Kubelet时，函数NewMainKubelet用于创建kubelet。里面调用 NewVolumeManager 来创建一个mgr。

// volumeManager implements the VolumeManager interface
type volumeManager struct {
	// kubeClient is the kube API client used by DesiredStateOfWorldPopulator to
	// communicate with the API server to fetch PV and PVC objects
	kubeClient clientset.Interface

	// volumePluginMgr is the volume plugin manager used to access volume
	// plugins. It must be pre-initialized.
	volumePluginMgr *volume.VolumePluginMgr

	// desiredStateOfWorld is a data structure containing the desired state of
	// the world according to the volume manager: i.e. what volumes should be
	// attached and which pods are referencing the volumes).
	// The data structure is populated by the desired state of the world
	// populator using the kubelet pod manager.
	desiredStateOfWorld cache.DesiredStateOfWorld

	// actualStateOfWorld is a data structure containing the actual state of
	// the world according to the manager: i.e. which volumes are attached to
	// this node and what pods the volumes are mounted to.
	// The data structure is populated upon successful completion of attach,
	// detach, mount, and unmount actions triggered by the reconciler.
	actualStateOfWorld cache.ActualStateOfWorld

	// operationExecutor is used to start asynchronous attach, detach, mount,
	// and unmount operations.
	operationExecutor operationexecutor.OperationExecutor

	// reconciler runs an asynchronous periodic loop to reconcile the
	// desiredStateOfWorld with the actualStateOfWorld by triggering attach,
	// detach, mount, and unmount operations using the operationExecutor.
	reconciler reconciler.Reconciler

	// desiredStateOfWorldPopulator runs an asynchronous periodic loop to
	// populate the desiredStateOfWorld using the kubelet PodManager.
	desiredStateOfWorldPopulator populator.DesiredStateOfWorldPopulator

	// csiMigratedPluginManager keeps track of CSI migration status of plugins
	csiMigratedPluginManager csimigration.PluginManager

	// intreeToCSITranslator translates in-tree volume specs to CSI
	intreeToCSITranslator csimigration.InTreeToCSITranslator
}

kubeClient clientset.Interface     // DesiredStateOfWorldPopulator用来与API服务器通信以获取PV和PVC对象的kube API客户端

volumePluginMgr *volume.VolumePluginMgr     // 实现初始化好的插件管理对象

desiredStateOfWorld cache.DesiredStateOfWorld      // 它包含了vm想要达成的一个状态：那些卷应该被连接，那个pods会引用这些卷。从 podManager获取数据

actualStateOfWorld cache.ActualStateOfWorld      // 它包含了实际的vm状态。在 attach, detach, mount, and unmount 操作的时候记录这些信息

operationExecutor operationexecutor.OperationExecutor      // 异步执行 attach, detach, mount,unmount操作

reconciler reconciler.Reconciler         // 运行异步定期循环，通过使用operationExecutor 来协调desiredStateOfWorld与actualStateOfWorld

desiredStateOfWorldPopulator populator.DesiredStateOfWorldPopulator          //  运行异步定期循环，使用 PodManager 填充 desiredStateOfWorld

ImageGC

镜像的垃圾清理。

kubelet每5分钟对ImageGC进行一次清理，清除未被使用的镜像。

ContainerGC

kubelet默认每一分钟执行一次容器的垃圾清理。清除未被使用的容器。

CertificationManager

证书管理

介绍完所有对象之后还剩下

Syncloop和podWorker

首先我们先了解一下PLEG（PodLifecycleEventGenerator）

它维护着存储Pod 信息的cache，从运行时获取容器的信息，并根据前后两次信息对比，生成对应的PodLifecycleEvent，通过eventChannel发送到kubelet syncLoop进行消费，最终由kubelet syncPod完成Pod的同步，维护着用户的“期望”。

pod的核心流程如下图：

废弃Docker Shim 原因

下面摘自弃用 Dockershim 的常见问题 | Kubernetes

为什么弃用 dockershim

维护 dockershim 已经成为 Kubernetes 维护者肩头一个沉重的负担。创建 CRI 标准就是为了减轻这个负担，同时也可以增加不同容器运行时之间平滑的互操作性。但反观 Docker 却至今也没有实现 CRI，所以麻烦就来了。

Dockershim 向来都是一个临时解决方案（因此得名：shim）。你可以进一步阅读移除 Dockershim 这一 Kubernetes 增强方案以了解相关的社区讨论和计划。

此外，与 dockershim 不兼容的一些特性，例如：控制组（cgoups）v2 和用户名字空间（user namespace），已经在新的 CRI 运行时中被实现。移除对 dockershim 的支持将加速这些领域的发展。

在Kubernetes 1.24，Docker shim组件正式从kubelet中移除。从Kubernetes 1.24开始，默认将无法使用 Docker Engine作为容器运行。从下图的架构可以看出，Docker shim处于一个非常尴尬的地位，它扮演着Docker、containerd 和CRI之间的翻译官或代理角色，但事实上新版本的containerd 兼容CRI接口标准，可以绕开Docker直接与CRI对接。