Bootstrap

Prometheus监控java应用

Prometheus监控java主要有两种方式

1、jmx_export

下载地址:https://github.com/prometheus/jmx_exporter
jmx_export是一个收集和监控JVM数据的工具, 提供两种使用方式:可以作为Java Agent的方式,也可以以独立http服务器使用
当前版本为0.17.1

1.1、以Java Agent方式使用

1.1.1、下载和配置

下载文件:jmx_prometheus_javaagent-0.17.1.jar
创建config.yaml:

lowercaseOutputLabelNames: false
lowercaseOutputName: false
rules:
- pattern: ".*"
1.1.2、运行

在启动的java应用中添加-javaagent:./jmx_prometheus_javaagent-0.17.1.jar=8955:config.yaml参数,如下:

java -javaagent:./jmx_prometheus_javaagent-0.17.1.jar=8955:config.yaml -jar app-1.0.0.jar

在其它容器内使用也是类似,比如在tomcat中,在catalina.sh启动脚本中添加以下配置

set "JAVA_OPTS=-javaagent:./jmx_prometheus_javaagent-0.17.1.jar=8955:./config.yaml"

注意jmx_prometheus_javaagent-0.17.1.jar和config.yaml两个文件也需要放在$TOMCAT_HOME/bin目录内。

1.2、以独立http服务方式使用

1.2.1、下载和配置

下载文件:jmx_prometheus_httpserver-0.17.1.jar

创建config_http.yaml

---
hostPort: localhost:5555
username: 
password: 

rules:
- pattern: ".*"
1.2.2、运行

启动java应用,需要先开启应用的jmx

java -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=5555 -Dcom.sun.management.jmxremote.authenticate=false -jar app-1.0.0.jar

启动jmx_prometheus_httpserver通过应用暴露的jmx端口,采集应用的jvm数据,并把采集到的数据通过8955商品暴露给prometheus拉取

java -jar jmx_prometheus_httpserver-0.17.1.jar 8955 config_http.yaml

1.3、访问jmx_exporter监控的数据

http://127.0.0.1:9101/metrics

1.4、与prometheus集成

vi /usr/local/prometheus-2.37.0.linux-amd64/prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
#    basic_auth:
#      username: admin
#      password: 123456
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: 'jmx'
    file_sd_configs:
      - files:
        - nodes/jmx.json
        refresh_interval: 5m
#  - job_name: 'spring_boot'
#    metrics_path: "/actuator/prometheus"
#    static_configs:
#      - targets: ['192.168.28.132:8955']

vi /usr/local/prometheus-2.37.0.linux-amd64/nodes/jmx.json

[
  {
    "targets": ["192.168.28.138:8955"],
    "labels": {
      "instance": "gateway-service[192.168.28.136_s1]"
    }
  },
  {
    "targets": ["192.168.31.204:9101"],
    "labels": {
      "instance": "tomcat[192.168.31.204:9101]"
    }
  }
]

2、客户端库集成

下载地址:https://github.com/prometheus/client_java
client_java库提供自定义埋点的方式来实现业务系统的监控。

2.1、微服务监控

client_java库只支持Spring Boot 1.X版本的监控,2.x需要使用Micrometer提供的prometheus集成包。
pom.xml添加以下的包:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-actuator</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<dependency>
    <groupId>io.github.mweirauch</groupId>
    <artifactId>micrometer-jvm-extras</artifactId>
    <version>0.2.2</version>
</dependency>

spring配置文件application.yml

server:
  port: 9090
  servlet:
    encoding:
      force: true
      charset: UTF-8
      enabled: true
  tomcat:
    uri-encoding: UTF-8
logging:
  file:
    name: logs/app.log
  logback:
    rollingpolicy:
      max-file-size: 10MB
  pattern:
    console: '%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %-5level %logger{36} %L - %msg%xEx%n'
    file: '%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %-5level %logger{36} %L - %msg%xEx%n'

# 暴露端口给prometheus
management:
  server:
    port: 8955
  endpoints:
    web:
      exposure:
        include: 'prometheus' # 暴露/actuator/prometheus
  metrics:
    tags:
      application: 'gateway-service' # 暴露的数据中添加application label

Prometheus.yml

scrape_configs:
  - job_name: 'spring_boot'
    metrics_path: "/actuator/prometheus"
    static_configs:
      - targets: ['192.168.28.132:8955']

启动spring
在这里插入图片描述
prometheus效果
在这里插入图片描述
Grafana效果
使用模板:https://grafana.com/grafana/dashboards/16144-jvm/

在这里插入图片描述

2.2、自定义埋点监控

2.2.1、内置的Collector模块

simpleclient_hotspot依赖包已内置JVM虚拟机运行状态(GC,内存池,JMX,类加载,线程池等)数据的Collector实现,通过一句代码就能直接添加:

# io.prometheus.client.hotspot.DefaultExport注册所有jvm内置Collector:
DefaultExports.initialize();

除了对JVM监控的Collector以外,client_java对Hibernate,Guava Cache,Jetty,Log4j、Logback等监控数据收集的也支持,引入对应的依赖包即可。

2.2.2、埋点监控

client_java提供:Counter、Gauge、Summary和Histogram对应Prometheus的4种监控类型,非常方便的在应用程序的业务流程中进行监控埋点。
Counter: 累加计数器,只增加不减,从0开始;

import io.prometheus.client.Counter;
class YourClass {
  static final Counter requests = Counter.build()
     .name("requests_total").help("Total requests.").register();

  void processRequest() {
    requests.inc();
    // Your code here.
  }
}

Gauge: 可以任意变化的指标,可增可减。比如CPU使用率、内存使用率、磁盘空间、网速速度、温度变化等等;

class YourClass {
  static final Gauge inprogressRequests = Gauge.build()
     .name("inprogress_requests").labelNames("cmd").help("Inprogress requests.").register();

  void processRequest() {
    inprogressRequests.inc();
    inprogressRequests.labels("java").set(100);
    // Your code here.
    inprogressRequests.dec();
  }
}

Histogram: 直方图,在一段时间范围内对数据进行采样。比如请求持续时间,响应大小等。产生三种直方图:
1、按bucket累计计数,此值<=上限边界采集值。
2、样本累计总和。
3、样本累计次数总和。

class YourClass {
    static final Histogram requestLatency = Histogram.build()
            .name("requests_latency_seconds").help("Request latency in seconds.").register();

    void processRequest(Request req) {
        Histogram.Timer requestTimer = requestLatency.startTimer();
        try {
            // Your code here.
        } finally {
            requestTimer.observeDuration();
        }
    }
}

Summary:与 Histogram 类型类似,用于表示一段时间内的数据采样结果。比如请求持续时间,响应大小。
Histogram和Summary区别:Summary直接存储分位数值,Histogram需要区间计算得出。

class YourClass {

  private static final Summary requestLatency = Summary.build()
      .name("requests_latency_seconds")
      .help("request latency in seconds")
      .register();

  private static final Summary receivedBytes = Summary.build()
      .name("requests_size_bytes")
      .help("request size in bytes")
      .register();

  public void processRequest(Request req) {
    Summary.Timer requestTimer = requestLatency.startTimer();
    try {
      // Your code here.
    } finally {
      requestTimer.observeDuration();
      receivedBytes.observe(req.size());
    }
  }
}

除了使用上边的内置和基本的Collector外,可以自定义Collector

class CustomCollector extends Collector {
  public List<MetricFamilySamples> collect() {
    List<MetricFamilySamples> mfs = new ArrayList<MetricFamilySamples>();
    // With no labels.
    mfs.add(new GaugeMetricFamily("my_gauge", "help", 42));
    // With labels
    GaugeMetricFamily labeledGauge = new GaugeMetricFamily("my_other_gauge", "help", Arrays.asList("labelname"));
    labeledGauge.addMetric(Arrays.asList("foo"), 4);
    labeledGauge.addMetric(Arrays.asList("bar"), 5);
    mfs.add(labeledGauge);
    return mfs;
  }
}

2.2.3、HTTP Server暴露样本数据

simpleclient_httpserver依赖包包含一个简单的HTTP服务器,当Prometheus访问该HTTP服务器时,会自动调用所有Collector的collect()方法。

public class CustomExporter {
    public static void main(String[] args) throws IOException {
        // 除内置Collector外,基本Collector和CustomCollector需要先执行register()方法
        new CustomCollector.register();
        HTTPServer server = new HTTPServer(1234);
    }
}
;