Bootstrap

【Prometheus】k8s集群部署prometheus监控

 一、为什么使用prometheus监控k8s集群

  • 与k8s高度集成:与k8s的组件高度集成,以获取有关Pod、容器和服务的指标数据
  • 方便查询数据:支持PromQL,一种多维的灵活查询语言,查询数据更方便简洁
  • 使用“拉”模型:通过HTTP上的拉取时间序列收集数据,相比“推”模型降低压力
  • 可扩展性强:Prometheus具有水平扩展的能力,可以通过添加更多的实例来处理大规模的监控数据,这使得它非常适合用于监控大型K8s集群

 二、部署prometheus

版本信息:
prometheus版本:3.1.0

node-exporter版本:1.8.2

kubernetes版本:1.28.14

 拉取prometheus docker镜像 

docker pull prom/prometheus

 推送镜像到harbor仓库中

docker tag prom/prometheus:latest harbor.com/k8s_repository/prometheus:3.1.0

docker push harbor.com/k8s_repository/prometheus:3.1.0

 编写prometheus部署的yaml,创建prometheus所需的ServiceAccount、ClusterRolebinding、Service、Ingress、Deployment、Configmap

#创建SA
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitor

---
#创建clusterrole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extenstions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

---
#创建clusterrolebinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitor

---
#创建svc
apiVersion: v1
kind: Service
metadata:
  name: prometheus-svc
  namespace: monitor
  labels:
    app: prometheus
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
    - name: web
      nodePort: 32224
      port: 9090
      targetPort: http

---
#创建ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-ingress
  namespace: monitor
spec:
  ingressClassName: nginx
  rules:
  - host: www.myprometheus.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name:  prometheus-svc
            port:
              number: 9090
---
#创建deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitor
  labels:
    app: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      initContainers:
      - name: "change-permission-of-directory"
        image: 192.168.119.150/myrepo/busybox:stable-glibc
        command: ["/bin/sh"]
        args: ["-c","chown -R 65534:65534 /prometheus"]
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: "/etc/prometheus"
          name: config-volume
        - mountPath: "/prometheus"
          name: data
      containers:
      - image: 192.168.119.150/k8s_repository/prometheus:3.1.0
        name: prometheus
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"#指定prometheus配置文件路径
        - "--storage.tsdb.path=/prometheus"#指定tsdb数据库存储路径
        - "--web.enable-lifecycle"#允许热更新,curl localhost:9090/-/reload 进行热更新
        - "--web.console.libraries=/usr/share/prometheus/console_libraries"
        - "--web.console.templates=/usr/share/prometheus/consoles"
        ports:
        - containerPort: 9090
          name: http
        volumeMounts:
        - mountPath: "/etc/prometheus"
          name: config-volume
        - mountPath: "/prometheus"
          name: data
        resources:
          requests:
            cpu: 100m
            memory: 512Mi
          limits:
            cpu: 100m
            memory: 512Mi
      volumes:
      - name: data
        hostPath:
          path: /data/prometheus
      - configMap:
          name: prometheus-config
        name: config-volume
---
#创建configmap
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitor
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']

kubectl apply 部署prometheus所需的yaml

kubectl apply -f prometheus.yaml

  三、metrics

 prometheus的pod自带收集数据的接口,我们通过curl {pod IP}:9090/metrics的方式,就可以收集到prometheus这个pod的指标,如下图所示

每个指标的格式都相同,指标的格式为:指标名{标签}:指标值

例如下面这个指标

promhttp_metric_handler_requests_total{code="200", instance="localhost:9090", job="prometheus"} 834

  • 指标名:promhttp_metric_handler_requests_total
  • 标签:{code="200", instance="localhost:9090", job="prometheus"}
  • 指标值:834

k8s的核心组件,如:kube-apiserver、scheduler、controller-manger、etcd、core-dns都会暴露metrics,提供监控数据接口,下图是访问coredns的metrics

 四、访问prometheus的web界面

   因为我们在上面的yaml中定义了prometheus的ingress,所以我们可以访问我们自定义的域名www.myprometheus.com来访问prometheus的web界面。

   如果prometheus启动正常的话,访问网址就会进入如下的界面

       在prometheus的web界面,我们可以按照prometheus指标来查询该指标的值,点击右边的蓝色按钮进行查询

 例如下方指标为promhttp_metric_handler_requests_total 的 值为 834

promhttp_metric_handler_requests_total{code="200", instance="localhost:9090", job="prometheus"}834

 五、prometheus监控k8s核心组件

       我们已经知道,通过curl {pod IP}:9090/metrics的方式,就可以收集到prometheus这个pod的指标,同样的,kube-apiserver、scheduler、controller-manger、etcd、core-dns这些k8s核心组件的pod也提供了metrics接口,我们可以通过curl的方式访问到kube-apiserver这个pod的metrics

curl -k --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key --cacert /etc/kubernetes/pki/ca.crt https://10.96.0.1/metrics

       执行上述指令后,会获得到kube-apiserver的指标

        编写prometheus的configmap,添加coredns和kube-apiserver的监控项

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitor
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']
    - job_name: 'coredns'#coredns监控项
      static_configs:
      - targets: ['10.96.0.10:9153']
    - job_name: 'kube-apiserver'#添加kube-apiserver监控项
      scheme: https
      tls_config:
        insecure_skip_verify: true
        ca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
      bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'
      static_configs:
      - targets: ['10.96.0.1:443']

 apply configmap的yaml

kubectl apply -f prometheus-configmap.yaml

 执行命令让prometheus重新加载配置文件

curl -XPOST 10.105.14.197:9090/-/reload

 重新访问prometheus web界面,可以看到coredns和kube-apiserver都已经被监控上了

 六、prometheus监控node

 我们使用node-exporter来监控node的指标 

 创建node-exporter的daemonset,在每个node上部署node-exporter,收集指标

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitor
  labels:
    app: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      nodeSelector:
        kubernetes.io/os: linux
      containers:
      - name: node-exporter
        image: 192.168.119.150/k8s_repository/node-exporter:1.8.2
        args:
        - --web.listen-address=$(HOSTIP):9100
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host/root
        - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
        - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
        ports:
        - containerPort: 9100
        env:
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        resources:
          requests:
            cpu: 150m
            memory: 180Mi
          limits:
            cpu: 150m
            memory: 180Mi
        securityContext:
          runAsNonRoot: true
          runAsUser: 65534
        volumeMounts:
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: root
          mountPath: /host/root
          mountPropagation: HostToContainer
          readOnly: true
      tolerations:
      - operator: "Exists"
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: dev
        hostPath:
          path: /dev
      - name: sys
        hostPath:
          path: /sys
      - name: root
        hostPath:
          path: /

 我们需要先知道prometheus抓取数据的流程,prometheus抓取数据的流程如下:

 在抓取Target数据之前,我们会有发现Target和relabel这两个步骤,在发现Target之后,我们有可能想要对发现的Target进行修改,这时我们就可以用relabel来修改发现的Target

 修改prometheus的configmap,修改配置文件,在配置文件加入下面三行,做node-exporter的自动发现

- job_name: 'node-exporter'
  kubernetes_sd_configs:
    - role: node

 重新加载prometheus后,发现node-exporter已经被自动发现了,但是没处于up状态。这是因为自动发现的endpoint的端口不对,我们需要继续修改配置文件,通过prometheus的relabel机制修改监控的端口

 在prometheus的配置文件中增加relabel_config配置

    - job_name: 'node-exporter'
      kubernetes_sd_configs:
        - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace

 可以看到,通过relabel,我们已经将10250端口改为了9100端口,node-exporter也变为了up状态

 七、prometheus监控pod

 pod级别的监控,我们由cadvisor实现

 通过curl {nodeIP}:10250/metrics/cadvisor , 访问pod级别的指标

curl -k --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key --cacert /etc/kubernetes/pki/ca.crt https://{nodeIP}:10250/metrics/cadvisor

 在prometheus的configmap中新增如下配置,并apply configmap

- job_name: 'cadvisor'
  kubernetes_sd_configs:
    - role: node
  scheme: https
  tls_config:
    insecure_skip_verify: true
    ca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
  bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'
  relabel_configs:
  - target_label: __metrics_path__
    replacement: /metrics/cadvisor

 reload prometheus后,发现cadvisor也已经被监控上了

;