# Kubernetes
https://iothub.org.cn/docs/kubernetes/
https://iothub.org.cn/docs/kubernetes/monitoring/prometheus/
一、概述
1.K8S监控指标
Kubernetes本身监控
- Node资源利用率 :一般生产环境几十个node,几百个node去监控
- Node数量 :一般能监控到node,就能监控到它的数量了,因为它是一个实例,一个node能跑多少个项目,也是需要去评估的,整体资源率在一个什么样的状态,什么样的值,所以需要根据项目,跑的资源利用率,还有值做一个评估的,比如再跑一个项目,需要多少资源。
- Pods数量(Node):其实也是一样的,每个node上都跑多少pod,不过默认一个node上能跑110个pod,但大多数情况下不可能跑这么多,比如一个128G的内存,32核cpu,一个java的项目,一个分配2G,也就是能跑50-60个,一般机器,pod也就跑几十个,很少很少超过100个。
- 资源对象状态 :比如pod,service,deployment,job这些资源状态,做一个统计。
Pod监控
- Pod数量(项目):你的项目跑了多少个pod的数量,大概的利益率是多少,好评估一下这个项目跑了多少个资源占有多少资源,每个pod占了多少资源。
- 容器资源利用率 :每个容器消耗了多少资源,用了多少CPU,用了多少内存
- 应用程序:这个就是偏应用程序本身的指标了,这个一般在我们运维很难拿到的,所以在监控之前呢,需要开发去给你暴露出来,这里有很多客户端的集成,客户端库就是支持很多语言的,需要让开发做一些开发量将它集成进去,暴露这个应用程序的想知道的指标,然后纳入监控,如果开发部配合,基本运维很难做到这一块,除非自己写一个客户端程序,通过shell/python能不能从外部获取内部的工作情况,如果这个程序提供API的话,这个很容易做到。
Prometheus监控K8S架构
如果想监控node的资源,就可以放一个node_exporter,这是监控node资源的,node_exporter是Linux上的采集器,你放上去你就能采集到当前节点的CPU、内存、网络IO,等待都可以采集的。
如果想监控容器,k8s内部提供cAdvisor采集器,pod呀,容器都可以采集到这些指标,都是内置的,不需要单独部署,只知道怎么去访问这个Cadvisor就可以了。
如果想监控k8s资源对象,会部署一个kube-state-metrics这个服务,它会定时的API中获取到这些指标,帮你存取到Prometheus里,要是告警的话,通过Alertmanager发送给一些接收方,通过Grafana可视化展示。
2.Prometheus Operator
https://github.com/prometheus-operator/prometheus-operator
1.Kubernetes Operator
在Kubernetes的支持下,管理和伸缩Web应用、移动应用后端以及API服务都变得比较简单了。因为这些应用一般都是无状态的,所以Deployment这样的基础Kubernetes API对象就可以在无需附加操作的情况下,对应用进行伸缩和故障恢复了。
而对于数据库、缓存或者监控系统等有状态应用的管理,就是挑战了。这些系统需要掌握应用领域的知识,正确地进行伸缩和升级,当数据丢失或不可用的时候,要进行有效的重新配置。我们希望这些应用相关的运维技能可以编码到软件之中,从而借助Kubernetes 的能力,正确地运行和管理复杂应用。
Operator这种软件,使用TPR(第三方资源,现在已经升级为CRD)机制对Kubernetes API进行扩展,将特定应用的知识融入其中,让用户可以创建、配置和管理应用。与Kubernetes的内置资源一样,Operator操作的不是一个单实例应用,而是集群范围内的多实例。
2.Prometheus Operator
Kubernetes的Prometheus Operator为Kubernetes服务和Prometheus实例的部署和管理提供了简单的监控定义。
安装完毕后,Prometheus Operator提供了以下功能:
- 创建/毁坏。在Kubernetes namespace中更容易启动一个Prometheus实例,一个特定的应用程序或团队更容易使用的Operato。
- 简单配置。配Prometheus的基础东西,比如在Kubernetes的本地资源versions, persistence,retention policies和replicas。
- Target Services通过标签。基于常见的Kubernetes label查询,自动生成监控target配置;不需要学习Prometheus特定的配置语言。
Prometheus Operator架构如图1所示。
架构中的各组成部分以不同的资源方式运行在Kubernetes集群中,它们各自有不同的作用。
- **Operator:**Operator资源会根据自定义资源(Custom Resource Definition,CRD)来部署和管理Prometheus Server,同时监控这些自定义资源事件的变化来做相应的处理,是整个系统的控制中心。
- Prometheus: Prometheus资源是声明性地描述Prometheus部署的期望状态。
- Prometheus Server: Operator根据自定义资源Prometheus类型中定义的内容而部署的Prometheus Server集群,这些自定义资源可以看作用来管理Prometheus Server 集群的StatefulSets资源。
- **ServiceMonitor:**ServiceMonitor也是一个自定义资源,它描述了一组被Prometheus监控的target列表。该资源通过标签来选取对应的Service Endpoint,让Prometheus Server通过选取的Service来获取Metrics信息。
- **Service:**Service资源主要用来对应Kubernetes集群中的Metrics Server Pod,提供给ServiceMonitor选取,让Prometheus Server来获取信息。简单说就是Prometheus监控的对象,例如Node Exporter Service、Mysql Exporter Service等。
- **Alertmanager:**Alertmanager也是一个自定义资源类型,由Operator根据资源描述内容来部署Alertmanager集群。
3.在Kubernetes上部署Prometheus的传统方式
本节详细介绍Kubernetes通过YAML文件方式部署Prometheus的过程,即按顺序部署了Prometheus、kube-state-metrics、node-exporter以及Grafana。图展示了各个组件的调用关系。
在Kubernetes Node上部署Node exporter,获取该节点物理机或者虚拟机的监控信息,在Kubernetes Master上部署kube-state-metrics获取Kubernetes集群的状态。所有信息汇聚到Prometheus进行处理和存储,然后通过Grafana进行展示。
3.kube-prometheus
kube-prometheus 是一整套监控解决方案,它使用 Prometheus 采集集群指标,Grafana 做展示,包含如下组件:
- The Prometheus Operator
- Highly available Prometheus
- Highly available Alertmanager
- Prometheus node-exporter
- Prometheus Adapter for Kubernetes Metrics APIs (k8s-prometheus-adapter)
- kube-state-metrics
- Grafana
# 地址
https://github.com/prometheus-operator
https://github.com/prometheus-operator/kube-prometheus
https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.8.0.tar.gz
https://github.com/kubernetes-monitoring/kubernetes-mixin
kube-prometheus是coreos
专门为kubernetes封装的一套高可用的监控预警,方便用户直接安装使用。
kube-prometheus
是用于kubernetes集群监控的,所以它预先配置了从所有Kubernetes组件中收集指标。除此之外,它还提供了一组默认的仪表板和警报规则。许多有用的仪表板和警报都来自于kubernetes-mixin项目,与此项目类似,它也提供了jsonnet,供用户根据自己的需求定制。
1.兼容性
The following versions are supported and work as we test against these versions in their respective branches. But note that other versions might work!
kube-prometheus stack | Kubernetes 1.18 | Kubernetes 1.19 | Kubernetes 1.20 | Kubernetes 1.21 | Kubernetes 1.22 |
---|---|---|---|---|---|
release-0.6 | ✗ | ✔ | ✗ | ✗ | ✗ |
release-0.7 | ✗ | ✔ | ✔ | ✗ | ✗ |
release-0.8 | ✗ | ✗ | ✔ | ✔ | ✗ |
release-0.9 | ✗ | ✗ | ✗ | ✔ | ✔ |
main | ✗ | ✗ | ✗ | ✔ | ✔ |
2.快速启动
使用manifests
目录下的配置创建监控:
# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/
二、基础
1.kube-prometheus配置
1.1.manifests配置文件
[root@k8s-master manifests]# pwd
/k8s/prometheus/kube-prometheus/manifests
[root@k8s-master manifests]#
[root@k8s-master manifests]# ll
total 1756
-rw-r--r-- 1 root root 875 Nov 24 11:43 alertmanager-alertmanager.yaml
-rw-r--r-- 1 root root 515 Nov 24 11:43 alertmanager-podDisruptionBudget.yaml
-rw-r--r-- 1 root root 6831 Nov 24 11:43 alertmanager-prometheusRule.yaml
-rw-r--r-- 1 root root 1169 Nov 24 11:43 alertmanager-secret.yaml
-rw-r--r-- 1 root root 301 Nov 24 11:43 alertmanager-serviceAccount.yaml
-rw-r--r-- 1 root root 540 Nov 24 11:43 alertmanager-serviceMonitor.yaml
-rw-r--r-- 1 root root 577 Nov 24 11:43 alertmanager-service.yaml
-rw-r--r-- 1 root root 278 Nov 24 11:43 blackbox-exporter-clusterRoleBinding.yaml
-rw-r--r-- 1 root root 287 Nov 24 11:43 blackbox-exporter-clusterRole.yaml
-rw-r--r-- 1 root root 1392 Nov 24 11:43 blackbox-exporter-configuration.yaml
-rw-r--r-- 1 root root 3080 Nov 24 11:43 blackbox-exporter-deployment.yaml
-rw-r--r-- 1 root root 96 Nov 24 11:43 blackbox-exporter-serviceAccount.yaml
-rw-r--r-- 1 root root 680 Nov 24 11:43 blackbox-exporter-serviceMonitor.yaml
-rw-r--r-- 1 root root 540 Nov 24 11:43 blackbox-exporter-service.yaml
-rw-r--r-- 1 root root 721 Nov 24 11:43 grafana-dashboardDatasources.yaml
-rw-r--r-- 1 root root 1406527 Nov 24 11:43 grafana-dashboardDefinitions.yaml
-rw-r--r-- 1 root root 625 Nov 24 11:43 grafana-dashboardSources.yaml
-rw-r--r-- 1 root root 8159 Nov 25 13:12 grafana-deployment.yaml
-rw-r--r-- 1 root root 86 Nov 24 11:43 grafana-serviceAccount.yaml
-rw-r--r-- 1 root root 398 Nov 24 11:43 grafana-serviceMonitor.yaml
-rw-r--r-- 1 root root 452 Nov 24 11:43 grafana-service.yaml
-rw-r--r-- 1 root root 3319 Nov 24 11:43 kube-prometheus-prometheusRule.yaml
-rw-r--r-- 1 root root 63571 Nov 24 11:43 kubernetes-prometheusRule.yaml
-rw-r--r-- 1 root root 6836 Nov 24 11:43 kubernetes-serviceMonitorApiserver.yaml
-rw-r--r-- 1 root root 440 Nov 24 11:43 kubernetes-serviceMonitorCoreDNS.yaml
-rw-r--r-- 1 root root 6355 Nov 24 11:43 kubernetes-serviceMonitorKubeControllerManager.yaml
-rw-r--r-- 1 root root 7171 Nov 24 11:43 kubernetes-serviceMonitorKubelet.yaml
-rw-r--r-- 1 root root 530 Nov 24 11:43 kubernetes-serviceMonitorKubeScheduler.yaml
-rw-r--r-- 1 root root 464 Nov 24 11:43 kube-state-metrics-clusterRoleBinding.yaml
-rw-r--r-- 1 root root 1712 Nov 24 11:43 kube-state-metrics-clusterRole.yaml
-rw-r--r-- 1 root root 2957 Nov 24 11:43 kube-state-metrics-deployment.yaml
-rw-r--r-- 1 root root 1864 Nov 24 11:43 kube-state-metrics-prometheusRule.yaml
-rw-r--r-- 1 root root 280 Nov 24 11:43 kube-state-metrics-serviceAccount.yaml
-rw-r--r-- 1 root root 1011 Nov 24 11:43 kube-state-metrics-serviceMonitor.yaml
-rw-r--r-- 1 root root 580 Nov 24 11:43 kube-state-metrics-service.yaml
-rw-r--r-- 1 root root 444 Nov 24 11:43 node-exporter-clusterRoleBinding.yaml
-rw-r--r-- 1 root root 461 Nov 24 11:43 node-exporter-clusterRole.yaml
-rw-r--r-- 1 root root 3020 Nov 24 11:43 node-exporter-daemonset.yaml
-rw-r--r-- 1 root root 12975 Nov 24 11:43 node-exporter-prometheusRule.yaml
-rw-r--r-- 1 root root 270 Nov 24 11:43 node-exporter-serviceAccount.yaml
-rw-r--r-- 1 root root 850 Nov 24 11:43 node-exporter-serviceMonitor.yaml
-rw-r--r-- 1 root root 492 Nov 24 11:43 node-exporter-service.yaml
-rw-r--r-- 1 root root 482 Nov 24 11:43 prometheus-adapter-apiService.yaml
-rw-r--r-- 1 root root 576 Nov 24 11:43 prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
-rw-r--r-- 1 root root 494 Nov 24 11:43 prometheus-adapter-clusterRoleBindingDelegator.yaml
-rw-r--r-- 1 root root 471 Nov 24 11:43 prometheus-adapter-clusterRoleBinding.yaml
-rw-r--r-- 1 root root 378 Nov 24 11:43 prometheus-adapter-clusterRoleServerResources.yaml
-rw-r--r-- 1 root root 409 Nov 24 11:43 prometheus-adapter-clusterRole.yaml
-rw-r--r-- 1 root root 1836 Nov 24 11:43 prometheus-adapter-configMap.yaml
-rw-r--r-- 1 root root 1804 Nov 24 11:43 prometheus-adapter-deployment.yaml
-rw-r--r-- 1 root root 515 Nov 24 11:43 prometheus-adapter-roleBindingAuthReader.yaml
-rw-r--r-- 1 root root 287 Nov 24 11:43 prometheus-adapter-serviceAccount.yaml
-rw-r--r-- 1 root root 677 Nov 24 11:43 prometheus-adapter-serviceMonitor.yaml
-rw-r--r-- 1 root root 501 Nov 24 11:43 prometheus-adapter-service.yaml
-rw-r--r-- 1 root root 447 Nov 24 11:43 prometheus-clusterRoleBinding.yaml
-rw-r--r-- 1 root root 394 Nov 24 11:43 prometheus-clusterRole.yaml
-rw-r--r-- 1 root root 4930 Nov 24 11:43 prometheus-operator-prometheusRule.yaml
-rw-r--r-- 1 root root 715 Nov 24 11:43 prometheus-operator-serviceMonitor.yaml
-rw-r--r-- 1 root root 499 Nov 24 11:43 prometheus-podDisruptionBudget.yaml
-rw-r--r-- 1 root root 12732 Nov 24 11:43 prometheus-prometheusRule.yaml
-rw-r--r-- 1 root root 1418 Nov 25 14:12 prometheus-prometheus.yaml
-rw-r--r-- 1 root root 471 Nov 24 11:43 prometheus-roleBindingConfig.yaml
-rw-r--r-- 1 root root 1547 Nov 24 11:43 prometheus-roleBindingSpecificNamespaces.yaml
-rw-r--r-- 1 root root 366 Nov 24 11:43 prometheus-roleConfig.yaml
-rw-r--r-- 1 root root 2047 Nov 24 11:43 prometheus-roleSpecificNamespaces.yaml
-rw-r--r-- 1 root root 271 Nov 24 11:43 prometheus-serviceAccount.yaml
-rw-r--r-- 1 root root 531 Nov 24 11:43 prometheus-serviceMonitor.yaml
-rw-r--r-- 1 root root 558 Nov 24 11:43 prometheus-service.yaml
drwxr-xr-x 2 root root 4096 Nov 25 14:24 setup
[root@k8s-master manifests]#
[root@k8s-master manifests]# cd setup/
[root@k8s-master setup]# ll
total 1028
-rw-r--r-- 1 root root 60 Nov 24 11:43 0namespace-namespace.yaml
-rw-r--r-- 1 root root 122054 Nov 24 11:43 prometheus-operator-0alertmanagerConfigCustomResourceDefinition.yaml
-rw-r--r-- 1 root root 247084 Nov 24 11:43 prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
-rw-r--r-- 1 root root 20907 Nov 24 11:43 prometheus-operator-0podmonitorCustomResourceDefinition.yaml
-rw-r--r-- 1 root root 20077 Nov 24 11:43 prometheus-operator-0probeCustomResourceDefinition.yaml
-rw-r--r-- 1 root root 327667 Nov 24 11:43 prometheus-operator-0prometheusCustomResourceDefinition.yaml
-rw-r--r-- 1 root root 3618 Nov 24 11:43 prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
-rw-r--r-- 1 root root 21941 Nov 24 11:43 prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
-rw-r--r-- 1 root root 253551 Nov 24 11:43 prometheus-operator-0thanosrulerCustomResourceDefinition.yaml
-rw-r--r-- 1 root root 471 Nov 24 11:43 prometheus-operator-clusterRoleBinding.yaml
-rw-r--r-- 1 root root 1377 Nov 24 11:43 prometheus-operator-clusterRole.yaml
-rw-r--r-- 1 root root 2328 Nov 25 14:24 prometheus-operator-deployment.yaml
-rw-r--r-- 1 root root 285 Nov 24 11:43 prometheus-operator-serviceAccount.yaml
-rw-r--r-- 1 root root 515 Nov 24 11:43 prometheus-operator-service.yaml
1.2.修改镜像源
国外镜像源某些镜像无法拉取,我们这里修改prometheus-operator,prometheus,alertmanager,kube-state-metrics,node-exporter,prometheus-adapter的镜像源为国内镜像源。我这里使用的是中科大的镜像源。
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' setup/prometheus-operator-deployment.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' prometheus-prometheus.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' alertmanager-alertmanager.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' kube-state-metrics-deployment.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' node-exporter-daemonset.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' prometheus-adapter-deployment.yaml
1.3.service
1.prometheus的service
[root@k8s-master manifests]# vim prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.26.0
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
ports:
- name: web
port: 9090
targetPort: web
nodePort: 30090 # 新增
type: NodePort # 新增
selector:
app: prometheus
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
prometheus: k8s
sessionAffinity: ClientIP
2.alertmanager的service
[root@k8s-master manifests]# vim alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
name: alertmanager-main
namespace: monitoring
spec:
ports:
- name: web
port: 9093
targetPort: web
nodePort: 30093 # 新增
type: NodePort # 新增
selector:
alertmanager: main
app: alertmanager
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
sessionAffinity: ClientIP
3.grafana的service
[root@k8s-master manifests]# vim grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 7.5.4
name: grafana
namespace: monitoring
spec:
ports:
- name: http
port: 3000
targetPort: http
nodePort: 32000 # 新增
type: NodePort # 新增
selector:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
1.4.修改副本数
默认alertmanager副本数为3,prometheus副本数为2,grafana副本数为1。这是官方基于高可用考虑的
1.alertmanager
vi alertmanager-alertmanager.yaml
2.prometheus
vi prometheus-prometheus.yaml
3.grafana
vi grafana-deployment.yaml
2.kube-prometheus数据持久化
2.1.Prometheus数据持久化
1.修改文件 manifests/prometheus-prometheus.yaml
#-----storage-----
storage: #这部分为持久化配置
volumeClaimTemplate:
spec:
storageClassName: rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
#-----------------
# 完整配置
# prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.26.0
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
externalLabels: {}
#-----storage-----
storage: #这部分为持久化配置
volumeClaimTemplate:
spec:
storageClassName: rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
#-----------------
image: quay.io/prometheus/prometheus:v2.26.0
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.26.0
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: 2.26.0
在修改yaml文件之后执行kubectl apply -f manifests/prometheus-prometheus.yaml命令,会自动创建两个指定大小的pv卷,因为会为prometheus的备份(prometheus-k8s-0,prometheus-k8s-1)也创建一个pv(我是用的是ceph-rbd,也可以换成其他的文件系统),但是pv卷创建之后,无法修改,所以最好先考虑好合适的参数配置,比如访问模式和容量大小。在下次重新apply prometheus-prometheus.yaml时,数据会存到已经创建的pv卷中。
2.存储时长配置为10天(无效)
修改文件manifests/setup/prometheus-operator-deployment.yaml
# prometheus-operator-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.47.0
name: prometheus-operator
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/part-of: kube-prometheus
template:
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.47.0
spec:
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0
- storage.tsdb.retention.time=10d # 在这添加time参数
image: quay.io/prometheus-operator/prometheus-operator:v0.47.0
name: prometheus-operator
ports:
- containerPort: 8080
name: http
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
allowPrivilegeEscalation: false
- args:
- --logtostderr
- --secure-listen-address=:8443
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- --upstream=http://127.0.0.1:8080/
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
name: kube-rbac-proxy
ports:
- containerPort: 8443
name: https
resources:
limits:
cpu: 20m
memory: 40Mi
requests:
cpu: 10m
memory: 20Mi
securityContext:
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
nodeSelector:
kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: prometheus-operator
**注意:**参数名为storage.tsdb.retention.time=10d
,我之前使用的是--storage.tsdb.retention.time=10d
,apply之后提示flag 提供了但未定义。
2.2.修改存储数据时间
prometheus operator数据保留天数,根据官方文档的说明,默认prometheus operator数据存储的时间为1d,这个时候无论你prometheus operator如何进行持久化,都没有作用,因为数据只保留了1天,那么你是无法看到更多天数的数据
官方文档可以配置的说明
实际上修改prometheus operator时间是通过retention
参数进行修改,上面也提示了在prometheus.spec
下填写
修改方法:
[root@k8s-master manifests]# pwd
/k8s/monitoring/prometheus/kube-prometheus-0.8.0/manifests
[root@k8s-master manifests]# vim prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
......
spec:
retention: 7d # 数据保留时间
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
externalLabels: {}
storage: #这部分为持久化配置
volumeClaimTemplate:
spec:
storageClassName: rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 40Gi
image: quay.mirrors.ustc.edu.cn/prometheus/prometheus:v2.26.0
# 如果已经安装了可以直接修改prometheus-prometheus.yaml 然后通过kubectl apply -f 刷新即可
# 修改完毕后检查pod运行状态是否正常
# 接下来可以访问grafana或者prometheus ui进行检查 (我这里修改完毕后等待2天,检查数据是否正常)
2.3.Grafana配置持久化
1.创建grafana-pvc
# grafana-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
storageClassName: rook-cephfs
然后在grafana-deployment.yaml将emptydir存储方式改为pvc方式:
#- emptyDir: {}
# name: grafana-storage
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
- name: grafana-datasources
secret:
secretName: grafana-datasources
- configMap:
name: grafana-dashboards
name: grafana-dashboards
- configMap:
name: grafana-dashboard-apiserver
name: grafana-dashboard-apiserver
- configMap:
name: grafana-dashboard-controller-manager
........
**文件存储(CephFS):**适用Deployment,多个Pod文件共享;
3.promethus operator监控扩展
3.1.ServiceMonitor
为了能够自动化的管理Prometheus的配置,Prometheus Operator使用了自定义资源类型ServiceMonitor来描述监控对象的信息。
serviceMonitor 是通过对service 获取数据的一种方式。
- promethus-operator可以通过serviceMonitor 自动识别带有某些 label 的service ,并从这些service 获取数据。
- serviceMonitor 也是由promethus-operator 自动发现的。
1.k8s集群内部服务监控
[root@k8s-m1 ~]# cat ServiceMonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: xxx-exporter
name: xxx
namespace: prometheus
spec:
endpoints:
- interval: 15s
port: xxx-exporter
jobLabel: xxx-exporter-monitor
namespaceSelector:
matchNames:
- monitor #目标服务的namespaces
selector:
matchLabels:
k8s-app: xx-exporter #目标服务的labels
namespaceSelector:
any: true # 选择所有的namespace
2.监控集群外部的exporter通Endpoints实现
# 为外部 exporter 服务设置 service
kind: Service
apiVersion: v1
metadata:
namespace: monitor
name: service-mysql-xx
labels:
app: service-mysql-xx
spec:
ports:
- protocol: TCP
port: 9xx
targetPort: 9xx
type: ClusterIP
clusterIP: None
---
kind: Endpoints
apiVersion: v1
metadata:
namespace: monitor
name: service-mysql-xx
labels:
app: service-mysql-xx
subsets:
- addresses:
- ip: x.x.x.x
ports:
- protocol: TCP
port: 9xxx
ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: service-mysql-xx
labels:
app: service-mysql-xx
spec:
selector:
matchLabels:
app: service-mysql-xx
namespaceSelector:
matchNames:
- monitor
endpoints:
- port: metrics
interval: 10s
honorLabels: true
3.2.传统配置(Additional Scrape Configuration)
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/additional-scrape-config.md
4.报警
三、实践
1.安装kube-prometheus
# Create the namespace and CRDs, and then wait for them to be available before creating the remaining resources
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/
1.安装
[root@k8s-master manifests]# cd kube-prometheus
# 1.创建prometheus-operator
[root@k8s-master kube-prometheus]# kubectl create -f manifests/setup
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
# 查看
[root@k8s-master kube-prometheus]# kubectl get all -n monitoring
NAME READY STATUS RESTARTS AGE
pod/prometheus-operator-7775c66ccf-qdg56 2/2 Running 0 63s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus-operator ClusterIP None <none> 8443/TCP 63s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-operator 1/1 1 1 63s
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-operator-7775c66ccf 1 1 1 63s
# 检查
[root@k8s-master kube-prometheus]# until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
No resources found
# 2.创建prometheus
[root@k8s-master kube-prometheus]# kubectl create -f manifests/
alertmanager.monitoring.coreos.com/main created
poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
servicemonitor.monitoring.coreos.com/prometheus-operator created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
Error from server (AlreadyExists): error when creating "manifests/prometheus-adapter-apiService.yaml": apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io" already exists
Error from server (AlreadyExists): error when creating "manifests/prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml": clusterroles.rbac.authorization.k8s.io "system:aggregated-metrics-reader" already exists
# 查看
[root@k8s-master prometheus]# kubectl get all -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-main-0 2/2 Running 0 119m
pod/alertmanager-main-1 2/2 Running 0 119m
pod/alertmanager-main-2 2/2 Running 0 119m
pod/blackbox-exporter-55c457d5fb-78bs9 3/3 Running 0 119m
pod/grafana-9df57cdc4-gnbkp 1/1 Running 0 119m
pod/kube-state-metrics-76f6cb7996-cfq79 2/3 ImagePullBackOff 0 62m
pod/node-exporter-g5vqs 2/2 Running 0 119m
pod/node-exporter-ks6h7 2/2 Running 0 119m
pod/node-exporter-s8c4m 2/2 Running 0 119m
pod/node-exporter-tlmp6 2/2 Running 0 119m
pod/prometheus-adapter-59df95d9f5-jdpzp 1/1 Running 0 119m
pod/prometheus-adapter-59df95d9f5-k6t66 1/1 Running 0 119m
pod/prometheus-k8s-0 2/2 Running 1 119m
pod/prometheus-k8s-1 2/2 Running 1 119m
pod/prometheus-operator-7775c66ccf-qdg56 2/2 Running 0 127m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main ClusterIP 10.96.148.16 <none> 9093/TCP 119m
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 119m
service/blackbox-exporter ClusterIP 10.102.106.12 <none> 9115/TCP,19115/TCP 119m
service/grafana ClusterIP 10.96.45.54 <none> 3000/TCP 119m
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 119m
service/node-exporter ClusterIP None <none> 9100/TCP 119m
service/prometheus-adapter ClusterIP 10.108.127.216 <none> 443/TCP 119m
service/prometheus-k8s ClusterIP 10.100.120.200 <none> 9090/TCP 119m
service/prometheus-operated ClusterIP None <none> 9090/TCP 119m
service/prometheus-operator ClusterIP None <none> 8443/TCP 127m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/node-exporter 4 4 4 4 4 kubernetes.io/os=linux 119m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/blackbox-exporter 1/1 1 1 119m
deployment.apps/grafana 1/1 1 1 119m
deployment.apps/kube-state-metrics 0/1 1 0 119m
deployment.apps/prometheus-adapter 2/2 2 2 119m
deployment.apps/prometheus-operator 1/1 1 1 127m
NAME DESIRED CURRENT READY AGE
replicaset.apps/blackbox-exporter-55c457d5fb 1 1 1 119m
replicaset.apps/grafana-9df57cdc4 1 1 1 119m
replicaset.apps/kube-state-metrics-76f6cb7996 1 1 0 119m
replicaset.apps/prometheus-adapter-59df95d9f5 2 2 2 119m
replicaset.apps/prometheus-operator-7775c66ccf 1 1 1 127m
NAME READY AGE
statefulset.apps/alertmanager-main 3/3 119m
statefulset.apps/prometheus-k8s 2/2 119m
[root@k8s-master prometheus]# until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
NAMESPACE NAME AGE
monitoring alertmanager 119m
monitoring blackbox-exporter 119m
monitoring coredns 119m
monitoring grafana 119m
monitoring kube-apiserver 119m
monitoring kube-controller-manager 119m
monitoring kube-scheduler 119m
monitoring kube-state-metrics 119m
monitoring kubelet 119m
monitoring node-exporter 119m
monitoring prometheus-adapter 119m
monitoring prometheus-k8s 119m
monitoring prometheus-operator 119m
[root@k8s-master prometheus]# kubectl get pod -n monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 2/2 Running 0 3h13m 10.244.36.95 k8s-node1 <none> <none>
alertmanager-main-1 2/2 Running 0 3h13m 10.244.36.121 k8s-node1 <none> <none>
alertmanager-main-2 2/2 Running 0 3h13m 10.244.36.92 k8s-node1 <none> <none>
blackbox-exporter-55c457d5fb-78bs9 3/3 Running 0 3h13m 10.244.36.104 k8s-node1 <none> <none>
grafana-9df57cdc4-gnbkp 1/1 Running 0 3h13m 10.244.36.123 k8s-node1 <none> <none>
kube-state-metrics-76f6cb7996-cfq79 3/3 Running 0 136m 10.244.107.193 k8s-node3 <none> <none>
node-exporter-g5vqs 2/2 Running 0 3h13m 172.51.216.84 k8s-node3 <none> <none>
node-exporter-ks6h7 2/2 Running 0 3h13m 172.51.216.81 k8s-master <none> <none>
node-exporter-s8c4m 2/2 Running 0 3h13m 172.51.216.82 k8s-node1 <none> <none>
node-exporter-tlmp6 2/2 Running 0 3h13m 172.51.216.83 k8s-node2 <none> <none>
prometheus-adapter-59df95d9f5-jdpzp 1/1 Running 0 3h13m 10.244.36.125 k8s-node1 <none> <none>
prometheus-adapter-59df95d9f5-k6t66 1/1 Running 0 3h13m 10.244.169.133 k8s-node2 <none> <none>
prometheus-k8s-0 2/2 Running 1 3h13m 10.244.169.143 k8s-node2 <none> <none>
prometheus-k8s-1 2/2 Running 1 3h13m 10.244.36.86 k8s-node1 <none> <none>
prometheus-operator-7775c66ccf-qdg56 2/2 Running 0 3h21m 10.244.36.89 k8s-node1 <none> <none>
2.镜像拉取处理
# 不能拉取的镜像
k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
# 处理方式
docker pull bitnami/kube-state-metrics:2.0.0
docker tag bitnami/kube-state-metrics:2.0.0 k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
# 参考地址
https://hub.docker.com/r/bitnami/kube-state-metrics
https://github.com/kubernetes/kube-state-metrics
https://github.com/kubernetes/kube-state-metrics
At most, 5 kube-state-metrics and 5 kubernetes releases will be recorded below.
kube-state-metrics | Kubernetes 1.18 | Kubernetes 1.19 | Kubernetes 1.20 | Kubernetes 1.21 | Kubernetes 1.22 |
---|---|---|---|---|---|
v1.9.8 | - | - | - | - | - |
v2.0.0 | -/✓ | ✓ | ✓ | -/✓ | -/✓ |
v2.1.1 | -/✓ | ✓ | ✓ | ✓ | -/✓ |
v2.2.4 | -/✓ | ✓ | ✓ | ✓ | ✓ |
master | -/✓ | ✓ | ✓ | ✓ | ✓ |
✓
Fully supported version range.-
The Kubernetes cluster has features the client-go library can’t use (additional API objects, deprecated APIs, etc).
3.创建Ingress
# prometheus-ing.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus-ing
namespace: monitoring
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: prometheus.k8s.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
number: 9090
- host: grafana.k8s.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
- host: alertmanager.k8s.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-main
port:
number: 9093
[root@k8s-master prometheus]# kubectl apply -f prometheus-ing.yaml
ingress.networking.k8s.io/prometheus-ing created
[root@k8s-master prometheus]# kubectl get ing -n monitoring
NAME CLASS HOSTS ADDRESS PORTS AGE
prometheus-ing <none> prometheus.k8s.com,grafana.k8s.com,alertmanager.k8s.com 80 14s
在本机添加域名解析
C:\Windows\System32\drivers\etc
在hosts文件添加域名解析(master主机的iP地址)
172.51.216.81 prometheus.k8s.com
172.51.216.81 grafana.k8s.com
172.51.216.81 alertmanager.k8s.com
[root@k8s-master prometheus]# kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx NodePort 10.97.245.122 <none> 80:31208/TCP,443:32099/TCP 71d
ingress-nginx-controller NodePort 10.101.238.213 <none> 80:30743/TCP,443:31410/TCP 71d
ingress-nginx-controller-admission ClusterIP 10.100.142.101 <none> 443/TCP 71d
# ingress地址
# 在本机浏览器输入:
http://prometheus.k8s.com:31208/
http://alertmanager.k8s.com:31208/
http://grafana.k8s.com:31208/
admin/admin
2.kube-prometheus数据持久化
2.1.Prometheus数据持久化
1.查看存储
# 查看prometheus
[root@k8s-master manifests]# kubectl get all -n monitoring
NAME READY STATUS RESTARTS AGE
pod/prometheus-k8s-0 2/2 Running 1 26h
pod/prometheus-k8s-1 2/2 Running 1 26h
NAME READY AGE
statefulset.apps/prometheus-k8s 2/2 26h
# 查看statefulset.apps/prometheus-k8s
[root@k8s-master manifests]# kubectl edit sts prometheus-k8s -n monitoring
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: apps/v1
kind: StatefulSet
metadata:
......
spec:
......
template:
......
spec:
containers:
- args:
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
- --config.file=/etc/prometheus/config_out/prometheus.env.yaml
- --storage.tsdb.path=/prometheus
# 存储时长
- --storage.tsdb.retention.time=24h
- --web.enable-lifecycle
- --storage.tsdb.no-lockfile
- --web.route-prefix=/
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
name: prometheus
ports:
- containerPort: 9090
name: web
protocol: TCP
......
volumeMounts:
- mountPath: /etc/prometheus/config_out
name: config-out
readOnly: true
- mountPath: /etc/prometheus/certs
name: tls-assets
readOnly: true
- mountPath: /prometheus
name: prometheus-k8s-db
- mountPath: /etc/prometheus/rules/prometheus-k8s-rulefiles-0
name: prometheus-k8s-rulefiles-0
......
volumes:
- name: config
secret:
defaultMode: 420
secretName: prometheus-k8s
- name: tls-assets
secret:
defaultMode: 420
secretName: prometheus-k8s-tls-assets
# 存储
- emptyDir: {}
name: config-out
- configMap:
defaultMode: 420
name: prometheus-k8s-rulefiles-0
name: prometheus-k8s-rulefiles-0
# 存储
- emptyDir: {}
name: prometheus-k8s-db
2.修改文件 manifests/prometheus-prometheus.yaml
#-----storage-----
storage: #这部分为持久化配置
volumeClaimTemplate:
spec:
storageClassName: rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
#-----------------
# 完整配置
# prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.26.0
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
externalLabels: {}
#-----storage-----
storage: #这部分为持久化配置
volumeClaimTemplate:
spec:
storageClassName: rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
#-----------------
image: quay.io/prometheus/prometheus:v2.26.0
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.26.0
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: 2.26.0
# 重新创建
[root@k8s-master manifests]# kubectl apply -f prometheus-prometheus.yaml
Warning: resource prometheuses/k8s is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
prometheus.monitoring.coreos.com/k8s configured
[root@k8s-master manifests]# kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-d14953b7-22b4-42e7-88a6-2a869f1f729c 10Gi RWO rook-ceph-block 91s
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-05f94b8e-b22b-4418-9887-d0210df8e79d 10Gi RWO rook-ceph-block 91s
[root@k8s-master manifests]# kubectl get pv -n monitoring
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-05f94b8e-b22b-4418-9887-d0210df8e79d 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 rook-ceph-block 103s
pvc-d14953b7-22b4-42e7-88a6-2a869f1f729c 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 rook-ceph-block 102s
3.存储时长配置为10天
修改文件manifests/setup/prometheus-operator-deployment.yaml
# prometheus-operator-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.47.0
name: prometheus-operator
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/part-of: kube-prometheus
template:
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.47.0
spec:
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0
- storage.tsdb.retention.time=10d # 在这添加time参数
image: quay.io/prometheus-operator/prometheus-operator:v0.47.0
name: prometheus-operator
ports:
- containerPort: 8080
name: http
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
allowPrivilegeEscalation: false
- args:
- --logtostderr
- --secure-listen-address=:8443
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- --upstream=http://127.0.0.1:8080/
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
name: kube-rbac-proxy
ports:
- containerPort: 8443
name: https
resources:
limits:
cpu: 20m
memory: 40Mi
requests:
cpu: 10m
memory: 20Mi
securityContext:
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
nodeSelector:
kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: prometheus-operator
**注意:**参数名为storage.tsdb.retention.time=10d
,我之前使用的是--storage.tsdb.retention.time=10d
,apply之后提示flag 提供了但未定义。
[root@k8s-master setup]# kubectl apply -f prometheus-operator-deployment.yaml
Warning: resource deployments/prometheus-operator is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
deployment.apps/prometheus-operator configured
4.查看存储
# 查看statefulset.apps/prometheus-k8s
[root@k8s-master manifests]# kubectl edit sts prometheus-k8s -n monitoring
[root@k8s-master setup]# kubectl edit sts prometheus-k8s -n monitoring
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: apps/v1
kind: StatefulSet
metadata:
......
spec:
......
spec:
containers:
- args:
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
- --config.file=/etc/prometheus/config_out/prometheus.env.yaml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=24h
- --web.enable-lifecycle
- --storage.tsdb.no-lockfile
- --web.route-prefix=/
image: quay.io/prometheus/prometheus:v2.26.0
......
volumes:
- name: config
secret:
defaultMode: 420
secretName: prometheus-k8s
- name: tls-assets
secret:
defaultMode: 420
secretName: prometheus-k8s-tls-assets
- emptyDir: {}
name: config-out
- configMap:
defaultMode: 420
name: prometheus-k8s-rulefiles-0
name: prometheus-k8s-rulefiles-0
# 修改后的存储
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: prometheus-k8s-db
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: rook-ceph-block
volumeMode: Filesystem
status:
phase: Pending
2.2.Grafana配置持久化
1.创建grafana-pvc
# grafana-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteMany
storageClassName: rook-cephfs
resources:
requests:
storage: 5Gi
[root@k8s-master prometheus]# kubectl apply -f grafana-pvc.yaml
persistentvolumeclaim/grafana-pvc created
[root@k8s-master prometheus]# kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
grafana-pvc Bound pvc-952480c5-6702-488b-b1ce-d4e39a17a137 5Gi RWX rook-cephfs 26s
[root@k8s-master prometheus]# kubectl get pv -n monitoring
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-952480c5-6702-488b-b1ce-d4e39a17a137 5Gi RWX Delete Bound monitoring/grafana-pvc rook-cephfs 114s
2.重新创建grafana
在grafana-deployment.yaml将emptydir存储方式改为pvc方式:
# grafana-deployment.yaml
volumes:
#- emptyDir: {}
# name: grafana-storage
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
- name: grafana-datasources
secret:
secretName: grafana-datasources
- configMap:
name: grafana-dashboards
name: grafana-dashboards
- configMap:
name: grafana-dashboard-apiserver
name: grafana-dashboard-apiserver
- configMap:
name: grafana-dashboard-controller-manager
........
[root@k8s-master manifests]# kubectl apply -f grafana-deployment.yaml
Warning: resource deployments/grafana is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
3.测试
删除Grafana的Pod后,原密码有效,原来配置存在。
3.Spring Boot监控
微服务:msa-ext-prometheus
3.1.创建服务
1.创建msa-ext-prometheus
msa-ext-prometheus.yaml
apiVersion: v1
kind: Service
metadata:
namespace: dev
name: msa-ext-prometheus
labels:
app: msa-ext-prometheus
spec:
type: NodePort
ports:
- port: 8158
name: msa-ext-prometheus
targetPort: 8158
nodePort: 30158 #对外暴露30158端口
selector:
app: msa-ext-prometheus
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: dev
name: msa-ext-prometheus
spec:
replicas: 3
selector:
matchLabels:
app: msa-ext-prometheus
template:
metadata:
labels:
app: msa-ext-prometheus
spec:
imagePullSecrets:
- name: harborsecret #对应创建私有镜像密钥Secret
containers:
- name: msa-ext-prometheus
image: 172.51.216.85:8888/springcloud/msa-ext-prometheus:1.0.0
imagePullPolicy: Always #如果省略imagePullPolicy,策略为IfNotPresent
ports:
- containerPort: 8158
env:
- name: ACTIVE
value: "-Dspring.profiles.active=k8s"
[root@k8s-master prometheus]# kubectl apply -f msa-ext-prometheus.yaml
service/msa-ext-prometheus created
deployment.apps/msa-ext-prometheus created
[root@k8s-master prometheus]# kubectl get all -n dev | grep msa-ext-prometheus
pod/msa-ext-prometheus-6599578495-dhblt 1/1 Running 0 44s
pod/msa-ext-prometheus-6599578495-srxs4 1/1 Running 0 44s
pod/msa-ext-prometheus-6599578495-z6l7d 1/1 Running 0 44s
service/msa-ext-prometheus NodePort 10.98.140.165 <none> 8158:30158/TCP 44s
deployment.apps/msa-ext-prometheus 3/3 3 3 44s
replicaset.apps/msa-ext-prometheus-6599578495 3 3 3 44s
# 访问 http://172.51.216.81:30158/actuator
{"_links":{"self":{"href":"http://172.51.216.81:30158/actuator","templated":false},"beans":{"href":"http://172.51.216.81:30158/actuator/beans","templated":false},"caches":{"href":"http://172.51.216.81:30158/actuator/caches","templated":false},"caches-cache":{"href":"http://172.51.216.81:30158/actuator/caches/{cache}","templated":true},"health":{"href":"http://172.51.216.81:30158/actuator/health","templated":false},"health-path":{"href":"http://172.51.216.81:30158/actuator/health/{*path}","templated":true},"info":{"href":"http://172.51.216.81:30158/actuator/info","templated":false},"conditions":{"href":"http://172.51.216.81:30158/actuator/conditions","templated":false},"configprops":{"href":"http://172.51.216.81:30158/actuator/configprops","templated":false},"env-toMatch":{"href":"http://172.51.216.81:30158/actuator/env/{toMatch}","templated":true},"env":{"href":"http://172.51.216.81:30158/actuator/env","templated":false},"loggers":{"href":"http://172.51.216.81:30158/actuator/loggers","templated":false},"loggers-name":{"href":"http://172.51.216.81:30158/actuator/loggers/{name}","templated":true},"heapdump":{"href":"http://172.51.216.81:30158/actuator/heapdump","templated":false},"threaddump":{"href":"http://172.51.216.81:30158/actuator/threaddump","templated":false},"prometheus":{"href":"http://172.51.216.81:30158/actuator/prometheus","templated":false},"metrics-requiredMetricName":{"href":"http://172.51.216.81:30158/actuator/metrics/{requiredMetricName}","templated":true},"metrics":{"href":"http://172.51.216.81:30158/actuator/metrics","templated":false},"scheduledtasks":{"href":"http://172.51.216.81:30158/actuator/scheduledtasks","templated":false},"mappings":{"href":"http://172.51.216.81:30158/actuator/mappings","templated":false}}}
# 访问 http://172.51.216.81:30158/actuator/prometheus
# HELP jvm_threads_daemon_threads The current number of live daemon threads
# TYPE jvm_threads_daemon_threads gauge
jvm_threads_daemon_threads{application="springboot-prometheus",region="iids",} 16.0
# HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
# TYPE jvm_buffer_memory_used_bytes gauge
jvm_buffer_memory_used_bytes{application="springboot-prometheus",id="mapped",region="iids",} 0.0
jvm_buffer_memory_used_bytes{application="springboot-prometheus",id="direct",region="iids",} 81920.0
# HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
# TYPE jvm_threads_peak_threads gauge
jvm_threads_peak_threads{application="springboot-prometheus",region="iids",} 20.0
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage{application="springboot-prometheus",region="iids",} 0.10248447204968944
# HELP jvm_gc_pause_seconds Time spent in GC pause
# TYPE jvm_gc_pause_seconds summary
jvm_gc_pause_seconds_count{action="end of major GC",application="springboot-prometheus",cause="Metadata GC Threshold",region="iids",} 1.0
jvm_gc_pause_seconds_sum{action="end of major GC",application="springboot-prometheus",cause="Metadata GC Threshold",region="iids",} 0.077
jvm_gc_pause_seconds_count{action="end of minor GC",application="springboot-prometheus",cause="Allocation Failure",region="iids",} 2.0
jvm_gc_pause_seconds_sum{action="end of minor GC",application="springboot-prometheus",cause="Allocation Failure",region="iids",} 0.048
# HELP jvm_gc_pause_seconds_max Time spent in GC pause
# TYPE jvm_gc_pause_seconds_max gauge
jvm_gc_pause_seconds_max{action="end of major GC",application="springboot-prometheus",cause="Metadata GC Threshold",region="iids",} 0.077
jvm_gc_pause_seconds_max{action="end of minor GC",application="springboot-prometheus",cause="Allocation Failure",region="iids",} 0.026
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{application="springboot-prometheus",exception="None",method="GET",outcome="CLIENT_ERROR",region="iids",status="404",uri="/**",} 1.0
http_server_requests_seconds_sum{application="springboot-prometheus",exception="None",method="GET",outcome="CLIENT_ERROR",region="iids",status="404",uri="/**",} 0.010432659
http_server_requests_seconds_count{application="springboot-prometheus",exception="None",method="GET",outcome="SUCCESS",region="iids",status="200",uri="/actuator",} 1.0
http_server_requests_seconds_sum{application="springboot-prometheus",exception="None",method="GET",outcome="SUCCESS",region="iids",status="200",uri="/actuator",} 0.071231044
http_server_requests_seconds_count{application="springboot-prometheus",exception="None",method="GET",outcome="SUCCESS",region="iids",status="200",uri="/actuator/prometheus",} 28.0
http_server_requests_seconds_sum{application="springboot-prometheus",exception="None",method="GET",outcome="SUCCESS",region="iids",status="200",uri="/actuator/prometheus",} 0.21368709
# HELP http_server_requests_seconds_max
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{application="springboot-prometheus",exception="None",method="GET",outcome="CLIENT_ERROR",region="iids",status="404",uri="/**",} 0.010432659
http_server_requests_seconds_max{application="springboot-prometheus",exception="None",method="GET",outcome="SUCCESS",region="iids",status="200",uri="/actuator",} 0.071231044
http_server_requests_seconds_max{application="springboot-prometheus",exception="None",method="GET",outcome="SUCCESS",region="iids",status="200",uri="/actuator/prometheus",} 0.089629576
# HELP logback_events_total Number of error level events that made it to the logs
# TYPE logback_events_total counter
logback_events_total{application="springboot-prometheus",level="error",region="iids",} 0.0
logback_events_total{application="springboot-prometheus",level="debug",region="iids",} 0.0
logback_events_total{application="springboot-prometheus",level="info",region="iids",} 7.0
logback_events_total{application="springboot-prometheus",level="trace",region="iids",} 0.0
logback_events_total{application="springboot-prometheus",level="warn",region="iids",} 0.0
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{application="springboot-prometheus",area="heap",id="Eden Space",region="iids",} 7.1630848E7
jvm_memory_committed_bytes{application="springboot-prometheus",area="heap",id="Tenured Gen",region="iids",} 1.78978816E8
jvm_memory_committed_bytes{application="springboot-prometheus",area="nonheap",id="Metaspace",region="iids",} 3.9714816E7
jvm_memory_committed_bytes{application="springboot-prometheus",area="nonheap",id="Compressed Class Space",region="iids",} 5111808.0
jvm_memory_committed_bytes{application="springboot-prometheus",area="heap",id="Survivor Space",region="iids",} 8912896.0
jvm_memory_committed_bytes{application="springboot-prometheus",area="nonheap",id="Code Cache",region="iids",} 1.0747904E7
# HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
# TYPE jvm_buffer_count_buffers gauge
jvm_buffer_count_buffers{application="springboot-prometheus",id="mapped",region="iids",} 0.0
jvm_buffer_count_buffers{application="springboot-prometheus",id="direct",region="iids",} 10.0
# HELP process_start_time_seconds Start time of the process since unix epoch.
# TYPE process_start_time_seconds gauge
process_start_time_seconds{application="springboot-prometheus",region="iids",} 1.637828102856E9
# HELP process_files_open_files The open file descriptor count
# TYPE process_files_open_files gauge
process_files_open_files{application="springboot-prometheus",region="iids",} 26.0
# HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool
# TYPE jvm_gc_max_data_size_bytes gauge
jvm_gc_max_data_size_bytes{application="springboot-prometheus",region="iids",} 1.78978816E8
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
# TYPE jvm_gc_memory_allocated_bytes_total counter
jvm_gc_memory_allocated_bytes_total{application="springboot-prometheus",region="iids",} 1.63422192E8
# HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
# TYPE jvm_classes_unloaded_classes_total counter
jvm_classes_unloaded_classes_total{application="springboot-prometheus",region="iids",} 0.0
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{application="springboot-prometheus",area="heap",id="Eden Space",region="iids",} 7.1630848E7
jvm_memory_max_bytes{application="springboot-prometheus",area="heap",id="Tenured Gen",region="iids",} 1.78978816E8
jvm_memory_max_bytes{application="springboot-prometheus",area="nonheap",id="Metaspace",region="iids",} -1.0
jvm_memory_max_bytes{application="springboot-prometheus",area="nonheap",id="Compressed Class Space",region="iids",} 1.073741824E9
jvm_memory_max_bytes{application="springboot-prometheus",area="heap",id="Survivor Space",region="iids",} 8912896.0
jvm_memory_max_bytes{application="springboot-prometheus",area="nonheap",id="Code Cache",region="iids",} 2.5165824E8
# HELP jvm_threads_states_threads The current number of threads having NEW state
# TYPE jvm_threads_states_threads gauge
jvm_threads_states_threads{application="springboot-prometheus",region="iids",state="blocked",} 0.0
jvm_threads_states_threads{application="springboot-prometheus",region="iids",state="terminated",} 0.0
jvm_threads_states_threads{application="springboot-prometheus",region="iids",state="waiting",} 12.0
jvm_threads_states_threads{application="springboot-prometheus",region="iids",state="timed-waiting",} 2.0
jvm_threads_states_threads{application="springboot-prometheus",region="iids",state="new",} 0.0
jvm_threads_states_threads{application="springboot-prometheus",region="iids",state="runnable",} 6.0
# HELP system_cpu_count The number of processors available to the Java virtual machine
# TYPE system_cpu_count gauge
system_cpu_count{application="springboot-prometheus",region="iids",} 1.0
# HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
# TYPE jvm_buffer_total_capacity_bytes gauge
jvm_buffer_total_capacity_bytes{application="springboot-prometheus",id="mapped",region="iids",} 0.0
jvm_buffer_total_capacity_bytes{application="springboot-prometheus",id="direct",region="iids",} 81920.0
# HELP tomcat_sessions_created_sessions_total
# TYPE tomcat_sessions_created_sessions_total counter
tomcat_sessions_created_sessions_total{application="springboot-prometheus",region="iids",} 0.0
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{application="springboot-prometheus",area="heap",id="Eden Space",region="iids",} 4.7401328E7
jvm_memory_used_bytes{application="springboot-prometheus",area="heap",id="Tenured Gen",region="iids",} 1.4121968E7
jvm_memory_used_bytes{application="springboot-prometheus",area="nonheap",id="Metaspace",region="iids",} 3.69704E7
jvm_memory_used_bytes{application="springboot-prometheus",area="nonheap",id="Compressed Class Space",region="iids",} 4569744.0
jvm_memory_used_bytes{application="springboot-prometheus",area="heap",id="Survivor Space",region="iids",} 0.0
jvm_memory_used_bytes{application="springboot-prometheus",area="nonheap",id="Code Cache",region="iids",} 9866496.0
# HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
# TYPE process_cpu_usage gauge
process_cpu_usage{application="springboot-prometheus",region="iids",} 0.0012414649286157666
# HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads
# TYPE jvm_threads_live_threads gauge
jvm_threads_live_threads{application="springboot-prometheus",region="iids",} 20.0
# HELP tomcat_sessions_expired_sessions_total
# TYPE tomcat_sessions_expired_sessions_total counter
tomcat_sessions_expired_sessions_total{application="springboot-prometheus",region="iids",} 0.0
# HELP tomcat_sessions_alive_max_seconds
# TYPE tomcat_sessions_alive_max_seconds gauge
tomcat_sessions_alive_max_seconds{application="springboot-prometheus",region="iids",} 0.0
# HELP process_files_max_files The maximum file descriptor count
# TYPE process_files_max_files gauge
process_files_max_files{application="springboot-prometheus",region="iids",} 1048576.0
# HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC
# TYPE jvm_gc_live_data_size_bytes gauge
jvm_gc_live_data_size_bytes{application="springboot-prometheus",region="iids",} 1.4121968E7
# HELP system_load_average_1m The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
# TYPE system_load_average_1m gauge
system_load_average_1m{application="springboot-prometheus",region="iids",} 0.22705078125
# HELP tomcat_sessions_active_current_sessions
# TYPE tomcat_sessions_active_current_sessions gauge
tomcat_sessions_active_current_sessions{application="springboot-prometheus",region="iids",} 0.0
# HELP tomcat_sessions_active_max_sessions
# TYPE tomcat_sessions_active_max_sessions gauge
tomcat_sessions_active_max_sessions{application="springboot-prometheus",region="iids",} 0.0
# HELP process_uptime_seconds The uptime of the Java virtual machine
# TYPE process_uptime_seconds gauge
process_uptime_seconds{application="springboot-prometheus",region="iids",} 147.396
# HELP tomcat_sessions_rejected_sessions_total
# TYPE tomcat_sessions_rejected_sessions_total counter
tomcat_sessions_rejected_sessions_total{application="springboot-prometheus",region="iids",} 0.0
# HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
# TYPE jvm_classes_loaded_classes gauge
jvm_classes_loaded_classes{application="springboot-prometheus",region="iids",} 6944.0
# HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
# TYPE jvm_gc_memory_promoted_bytes_total counter
jvm_gc_memory_promoted_bytes_total{application="springboot-prometheus",region="iids",} 6409976.0
### 查看指标
http://localhost:8080/actuator
http://localhost:8080/actuator/prometheus
http://localhost:8080/actuator/metrics
http://localhost:8080/actuator/metrics/jvm.memory.max
http://localhost:8080/actuator/loggers
http://localhost:8080/actuator/prometheus
3.2.创建ServiceMonitor
msa-servicemonitor.yaml
[root@k8s-m1 ~]# cat msa-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: msa-ext-prometheus
name: msa-servicemonitor
namespace: monitoring
spec:
endpoints:
- interval: 15s
port: msa
jobLabel: msa-ext-prometheus-monitor
namespaceSelector:
matchNames:
- dev #目标服务的namespaces
selector:
matchLabels:
app: msa-ext-prometheus #目标服务的labels
[root@k8s-master prometheus]# kubectl apply -f msa-servicemonitor.yaml
servicemonitor.monitoring.coreos.com/msa-servicemonitor created
[root@k8s-master prometheus]# kubectl get servicemonitor -n monitoring
NAME AGE
alertmanager 28h
blackbox-exporter 28h
coredns 28h
grafana 28h
kube-apiserver 28h
kube-controller-manager 28h
kube-scheduler 28h
kube-state-metrics 28h
kubelet 28h
msa-servicemonitor 39s
node-exporter 28h
prometheus-adapter 28h
prometheus-k8s 28h
prometheus-operator 28h
# 没成功
3.3.传统配置
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/additional-scrape-config.md
1.创建文件
prometheus-additional.yaml
- job_name: spring-boot-actuator
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['msa-ext-prometheus.dev.svc.cluster.local:8158']
labels:
instance: spring-actuator
# 参考
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
2.创建
2.
kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run -oyaml > additional-scrape-configs.yaml
[root@k8s-master prometheus]# kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run -oyaml > additional-scrape-configs.yaml
W1125 17:01:27.653622 50590 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
[root@k8s-master prometheus]# ll
total 344
-rw-r--r-- 1 root root 435 Nov 25 17:01 additional-scrape-configs.yaml
-rw-r--r-- 1 root root 225 Nov 25 17:00 prometheus-additional.yaml
[root@k8s-master prometheus]# vim additional-scrape-configs.yaml
apiVersion: v1
data:
prometheus-additional.yaml: LSBqb2JfbmFtZTogc3ByaW5nLWJvb3QtYWN0dWF0b3IKICBtZXRyaWNzX3BhdGg6ICcvYWN0dWF0b3IvcHJvbWV0aGV1cycKICBzY3JhcGVfaW50ZXJ2YWw6IDVzCiAgc3RhdGljX2NvbmZpZ3M6CiAgICAtIHRhcmdldHM6IFsnbXNhLWV4dC1wcm9tZXRoZXVzLmRldi5zdmMuY2x1c3Rlci5sb2NhbDo4MTU4J10KICAgICAgbGFiZWxzOgogICAgICAgIGluc3RhbmNlOiBzcHJpbmctYWN0dWF0b3IK
kind: Secret
metadata:
creationTimestamp: null
name: additional-scrape-configs
[root@k8s-master prometheus]# kubectl apply -f additional-scrape-configs.yaml -n monitoring
secret/additional-scrape-configs created
3.
#prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
labels:
prometheus: prometheus
spec:
replicas: 2
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
# 添加
additionalScrapeConfigs:
name: additional-scrape-configs
key: prometheus-additional.yaml
[root@k8s-master manifests]# kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com/k8s configured
[root@k8s-master manifests]# kubectl delete pod prometheus-k8s-0 -n monitoring
pod "prometheus-k8s-0" deleted
[root@k8s-master manifests]# kubectl delete pod prometheus-k8s-1 -n monitoring
pod "prometheus-k8s-1" deleted
3.测试
4.增加新配置
修改prometheus-additional.yaml
- job_name: spring-boot-actuator
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['msa-ext-prometheus.dev.svc.cluster.local:8158']
labels:
instance: spring-actuator
- job_name: spring-boot
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['msa-ext-prometheus.dev.svc.cluster.local:8158']
labels:
instance: spring-actuator-1
[root@k8s-master prometheus]# kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run -oyaml > additional-scrape-configs.yaml
W1126 09:53:24.608990 62215 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
[root@k8s-master prometheus]# kubectl apply -f additional-scrape-configs.yaml -n monitoring
secret/additional-scrape-configs configured
3.4.配置Grafana
### 选择模板
# 4701
https://grafana.com/grafana/dashboards/4701
# 11955
https://grafana.com/grafana/dashboards/11955
4.监控集群外部服务
外部服务选择MySQL,MySQL部署在K8S外部。
4.1.安装MySQL
# 搭建数据库
#创建目录
mkdir -p /ntms/mysql/conf
mkdir -p /ntms/mysql/data
chmod 777 * -R /ntms/mysql/conf
chmod 777 * -R /ntms/mysql/data
#创建配置文件
vim /ntms/mysql/conf/my.cnf
#输入一下内容
[mysqld]
log-bin=mysql-bin #开启二进制日志
server-id=119 #服务id,取本机IP最后三位
sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
#启动容器
docker run -d --restart=always \
-p 3306:3306 \
-v /ntms/mysql/data:/var/lib/mysql \
-v /ntms/mysql/conf:/etc/mysql/ \
-e MYSQL_ROOT_PASSWORD=root \
--name mysql \
percona:5.7.23
#进入mysql容器内部
docker exec -it mysql-master bash
#登陆mysql
mysql -u root -p
4.2.mysqld_exporter
# github
https://github.com/prometheus/mysqld_exporter
# dockerhub
mysqld_exporter
https://registry.hub.docker.com/r/prom/mysqld-exporter
安装exporter
docker run -d --network host --name mysqld-exporter --restart=always \
-e DATA_SOURCE_NAME="root:root@(172.51.216.98:3306)/" \
prom/mysqld-exporter
### 访问地址
http://172.51.216.98:9104/metrics
curl '172.51.216.98:9104/metrics'
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000229989
go_gc_duration_seconds{quantile="0.25"} 0.000229989
go_gc_duration_seconds{quantile="0.5"} 0.000229989
go_gc_duration_seconds{quantile="0.75"} 0.000229989
go_gc_duration_seconds{quantile="1"} 0.000229989
go_gc_duration_seconds_sum 0.000229989
go_gc_duration_seconds_count 1
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 7
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.16.4"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.355664e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.581928e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.446768e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 31105
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 4.591226769073576e-06
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 4.817184e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.355664e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.3463424e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 3.121152e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 13039
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.2251008e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6584576e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
......
4.3.创建Endpoints
# service-mysql-exporter.yaml
kind: Service
apiVersion: v1
metadata:
namespace: dev
name: service-mysql-exporter
labels:
app: service-mysql-exporter
spec:
ports:
- protocol: TCP
port: 9104
targetPort: 9104
type: ClusterIP
---
kind: Endpoints
apiVersion: v1
metadata:
namespace: dev
name: service-mysql-exporter
labels:
app: service-mysql-exporter
subsets:
- addresses:
- ip: 172.51.216.98
ports:
- protocol: TCP
port: 9104
[root@k8s-master prometheus]# kubectl apply -f service-mysql-exporter.yaml
service/service-mysql-exporter created
endpoints/service-mysql-exporter created
[root@k8s-master prometheus]# kubectl get svc -n dev
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service-mysql-exporter ClusterIP 10.101.125.113 <none> 9104/TCP 100s
[root@k8s-master prometheus]# kubectl get ep -n dev
NAME ENDPOINTS AGE
service-mysql-exporter 172.51.216.98:9104 33s
[root@k8s-master prometheus]# curl 10.101.125.113:9104/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000229989
go_gc_duration_seconds{quantile="0.25"} 0.000229989
go_gc_duration_seconds{quantile="0.5"} 0.000229989
go_gc_duration_seconds{quantile="0.75"} 0.000229989
go_gc_duration_seconds{quantile="1"} 0.000229989
go_gc_duration_seconds_sum 0.000229989
go_gc_duration_seconds_count 1
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 7
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
......
# K8S 中的容器使用访问
svcname.namespace.svc.cluster.local:port
service-mysql-exporter.dev.svc.cluster.local:9104
4.4.创建ServiceMonitor
# mysql-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: mysql-servicemonitor
namespace: monitoring
labels:
app: service-mysql-exporter
spec:
selector:
matchLabels:
app: service-mysql-exporter
namespaceSelector:
matchNames:
- dev
endpoints:
- port: metrics
interval: 10s
honorLabels: true
[root@k8s-master prometheus]# kubectl apply -f mysql-servicemonitor.yaml
servicemonitor.monitoring.coreos.com/mysql-servicemonitor created
[root@k8s-master prometheus]# kubectl get servicemonitor -n monitoring
NAME AGE
mysql-servicemonitor 30s
未成功
4.5.传统配置
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/additional-scrape-config.md
1.修改文件
prometheus-additional.yaml
- job_name: spring-boot
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['msa-ext-prometheus.dev.svc.cluster.local:8158']
labels:
instance: spring-actuator
- job_name: mysql
scrape_interval: 10s
static_configs:
- targets: ['service-mysql-exporter.dev.svc.cluster.local:9104']
labels:
instance: mysqld
2.创建
[root@k8s-master prometheus]# kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run -oyaml > additional-scrape-configs.yaml
W1126 11:57:08.632627 78468 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
[root@k8s-master prometheus]# kubectl apply -f additional-scrape-configs.yaml -n monitoring
secret/additional-scrape-configs configured
4.6.配置Grafana
### 选择模板
# 11323
https://grafana.com/grafana/dashboards/11323
# 7362
https://grafana.com/grafana/dashboards/7362
5.微服务配置
5.1.服务发现机制
但是如果使用微服务的话,一个服务一个服务的配置似乎太麻烦,Prometheus提供了很多服务发现的机制去统一配置服务,具体可以查看官网介绍:https://prometheus.io/docs/prometheus/latest/configuration/configuration/
提供了Consul的服务发现机制,没有Eureka的服务发现机制。但是如果Eureka想要使用的话可以通过配置一个适配器的方式,使用consul_sd_config配置的方式使用Prometheus服务发现。
创建一个Eureka Server,在配置中加入eureka-consul-adapter依赖,pom文件完整内容如下所示。
<dependency>
<groupId>at.twinformatics</groupId>
<artifactId>eureka-consul-adapter</artifactId>
<version>1.1.0</version>
</dependency>
Eureka Consul Adapter
Maven
<dependency>
<groupId>at.twinformatics</groupId>
<artifactId>eureka-consul-adapter</artifactId>
<version>${eureka-consul-adapter.version}<version>
</dependency>
Requirements
- Java 1.8+
Versions 1.1.x and later
- Spring Boot 2.1.x
- Spring Cloud Greenwich
Versions 1.0.x and later
- Spring Boot 2.0.x
- Spring Cloud Finchley
Versions 0.x
- Spring Boot 1.5.x
- Spring Cloud Edgware
5.2.创建微服务
1.修改Eureka服务配置
pom.xml
<dependency>
<groupId>at.twinformatics</groupId>
<artifactId>eureka-consul-adapter</artifactId>
<version>1.1.0</version>
</dependency>
2.创建镜像
创建镜像,上传到镜像仓库。
3.服务配置
注册中心(eureka-server)、生产者(msa-deploy-producer)、消费者(msa-deploy-consumer)
msa-eureka.yaml
apiVersion: v1
kind: Service
metadata:
namespace: dev
name: msa-eureka
labels:
app: msa-eureka
spec:
type: NodePort
ports:
- port: 10001
name: msa-eureka
targetPort: 10001
nodePort: 30001 #对外暴露30001端口
selector:
app: msa-eureka
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: dev
name: msa-eureka
spec:
serviceName: msa-eureka
replicas: 3
selector:
matchLabels:
app: msa-eureka
template:
metadata:
labels:
app: msa-eureka
spec:
imagePullSecrets:
- name: harborsecret #对应创建私有镜像密钥Secret
containers:
- name: msa-eureka
image: 172.51.216.85:8888/springcloud/msa-eureka:2.1.0
ports:
- containerPort: 10001
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: EUREKA_SERVER
value: "http://msa-eureka-0.msa-eureka.dev:10001/eureka/,http://msa-eureka-1.msa-eureka.dev:10001/eureka/,http://msa-eureka-2.msa-eureka.dev:10001/eureka/"
- name: EUREKA_INSTANCE_HOSTNAME
value: ${MY_POD_NAME}.msa-eureka.dev
- name: ACTIVE
value: "-Dspring.profiles.active=k8s"
podManagementPolicy: "Parallel"
msa-deploy-producer.yaml
apiVersion: v1
kind: Service
metadata:
namespace: dev
name: msa-deploy-producer
labels:
app: msa-deploy-producer
spec:
type: ClusterIP
ports:
- port: 8911
targetPort: 8911
selector:
app: msa-deploy-producer
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: dev
name: msa-deploy-producer
spec:
replicas: 3
selector:
matchLabels:
app: msa-deploy-producer
template:
metadata:
labels:
app: msa-deploy-producer
spec:
imagePullSecrets:
- name: harborsecret #对应创建私有镜像密钥Secret
containers:
- name: msa-deploy-producer
image: 172.51.216.85:8888/springcloud/msa-deploy-producer:2.0.0
imagePullPolicy: Always #如果省略imagePullPolicy,策略为IfNotPresent
ports:
- containerPort: 8911
env:
- name: EUREKA_SERVER
value: "http://msa-eureka-0.msa-eureka.dev:10001/eureka/,http://msa-eureka-1.msa-eureka.dev:10001/eureka/,http://msa-eureka-2.msa-eureka.dev:10001/eureka/"
- name: ACTIVE
value: "-Dspring.profiles.active=k8s"
- name: DASHBOARD
value: "sentinel-server.dev.svc.cluster.local:8858"
msa-deploy-consumer.yaml
apiVersion: v1
kind: Service
metadata:
namespace: dev
name: msa-deploy-consumer
labels:
app: msa-deploy-consumer
spec:
type: ClusterIP
ports:
- port: 8912
targetPort: 8912
selector:
app: msa-deploy-consumer
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: dev
name: msa-deploy-consumer
spec:
replicas: 3
selector:
matchLabels:
app: msa-deploy-consumer
template:
metadata:
labels:
app: msa-deploy-consumer
spec:
imagePullSecrets:
- name: harborsecret #对应创建私有镜像密钥Secret
containers:
- name: msa-deploy-consumer
image: 172.51.216.85:8888/springcloud/msa-deploy-consumer:2.0.0
imagePullPolicy: Always #如果省略imagePullPolicy,策略为IfNotPresent
ports:
- containerPort: 8912
env:
- name: EUREKA_SERVER
value: "http://msa-eureka-0.msa-eureka.dev:10001/eureka/,http://msa-eureka-1.msa-eureka.dev:10001/eureka/,http://msa-eureka-2.msa-eureka.dev:10001/eureka/"
- name: ACTIVE
value: "-Dspring.profiles.active=k8s"
- name: DASHBOARD
value: "sentinel-server.dev.svc.cluster.local:8858"
4.创建服务
[root@k8s-master prometheus]# kubectl get all -n dev
NAME READY STATUS RESTARTS AGE
pod/msa-deploy-consumer-6b75cf55d-tgd82 1/1 Running 0 33s
pod/msa-deploy-consumer-6b75cf55d-x25lm 1/1 Running 0 33s
pod/msa-deploy-consumer-6b75cf55d-z22pv 1/1 Running 0 33s
pod/msa-deploy-producer-7965c98bbf-6cxf2 1/1 Running 0 80s
pod/msa-deploy-producer-7965c98bbf-vfp2v 1/1 Running 0 80s
pod/msa-deploy-producer-7965c98bbf-xzst2 1/1 Running 0 80s
pod/msa-eureka-0 1/1 Running 0 2m12s
pod/msa-eureka-1 1/1 Running 0 2m12s
pod/msa-eureka-2 1/1 Running 0 2m12s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/msa-deploy-consumer ClusterIP 10.102.251.125 <none> 8912/TCP 33s
service/msa-deploy-producer ClusterIP 10.110.134.51 <none> 8911/TCP 80s
service/msa-eureka NodePort 10.102.205.141 <none> 10001:30001/TCP 2m13s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/msa-deploy-consumer 3/3 3 3 33s
deployment.apps/msa-deploy-producer 3/3 3 3 80s
NAME DESIRED CURRENT READY AGE
replicaset.apps/msa-deploy-consumer-6b75cf55d 3 3 3 33s
replicaset.apps/msa-deploy-producer-7965c98bbf 3 3 3 80s
NAME READY AGE
statefulset.apps/msa-eureka 3/3 2m13s
5.3.Prometheus配置
1.修改文件
prometheus-additional.yaml
- job_name: spring-boot
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['msa-ext-prometheus.dev.svc.cluster.local:8158']
labels:
instance: spring-actuator
- job_name: mysql
scrape_interval: 10s
static_configs:
- targets: ['service-mysql-exporter.dev.svc.cluster.local:9104']
labels:
instance: mysqld
- job_name: consul-prometheus
scheme: http
metrics_path: '/actuator/prometheus'
consul_sd_configs:
- server: 'msa-eureka-0.msa-eureka.dev:10001'
scheme: http
services: [msa-deploy-producer]
未测试通过
分别配置
- job_name: spring-boot
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['msa-ext-prometheus.dev.svc.cluster.local:8158']
labels:
instance: spring-actuator
- job_name: mysql
scrape_interval: 10s
static_configs:
- targets: ['service-mysql-exporter.dev.svc.cluster.local:9104']
labels:
instance: mysqld
- job_name: consul-prometheus
scheme: http
metrics_path: '/actuator/prometheus'
consul_sd_configs:
- server: 'msa-eureka-0.msa-eureka.dev:10001'
scheme: http
services: [msa-deploy-producer]
- job_name: msa-eureka
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['msa-eureka-0.msa-eureka.dev:10001']
labels:
instance: msa-eureka-0
- targets: ['msa-eureka-1.msa-eureka.dev:10001']
labels:
instance: msa-eureka-1
- targets: ['msa-eureka-2.msa-eureka.dev:10001']
labels:
instance: msa-eureka-2
- job_name: msa-deploy-producer
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['msa-deploy-producer.dev.svc.cluster.local:8911']
labels:
instance: spring-actuator
- job_name: msa-deploy-consumer
metrics_path: '/actuator/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['msa-deploy-consumer.dev.svc.cluster.local:8912']
labels:
instance: spring-actuator
2.创建
[root@k8s-master prometheus]# kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run -oyaml > additional-scrape-configs.yaml
W1126 11:57:08.632627 78468 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
[root@k8s-master prometheus]# kubectl apply -f additional-scrape-configs.yaml -n monitoring
secret/additional-scrape-configs configured
4.6.配置Grafana
### 选择模板
# 4701
https://grafana.com/grafana/dashboards/4701
# 11955
https://grafana.com/grafana/dashboards/11955
# Kubernetes
https://iothub.org.cn/docs/kubernetes/
https://iothub.org.cn/docs/kubernetes/monitoring/prometheus/