部署Alertmanager发送邮件告警
-
安装与配置Alertmanager
cd /opt tar xf alertmanager-0.24.0.linux-amd64.tar.gz mv alertmanager-0.24.0.linux-amd64 /usr/local/alertmanager
-
修改配置文件
alertmanager.yml
global: resolve_timeout: 5m smtp_smarthost: 'smtp.qq.com:465' # 使用SSL端口 smtp_from: '[email protected]' smtp_auth_username: '[email protected]' smtp_auth_password: 'zxnlltckqkrxxxcc' # QQ邮箱授权码 smtp_require_tls: true # 启用TLS route: group_by: ['alertname'] group_wait: 20s group_interval: 5m repeat_interval: 20m receiver: 'my-email' receivers: - name: 'my-email' email_configs: - to: '[email protected]' send_resolved: true
-
创建Systemd服务并启动
cat > /usr/lib/systemd/system/alertmanager.service <<EOF [Unit] Description=Alertmanager After=network.target [Service] Type=simple ExecStart=/usr/local/alertmanager/alertmanager \ --config.file=/usr/local/alertmanager/alertmanager.yml \ --log.level=debug Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl start alertmanager systemctl enable alertmanager
-
验证Alertmanager运行
netstat -tulnp | grep 9093
配置Prometheus告警规则
-
创建告警规则文件
mkdir -p /usr/local/prometheus/alert_rules vim /usr/local/prometheus/alert_rules/instance_down.yaml
groups: - name: AllInstances rules: - alert: InstanceDown expr: up == 0 for: 1m annotations: title: 'Instance down' description: 'Instance has been down for more than 1 minute.' labels: severity: 'critical'
-
修改Prometheus配置
alerting: alertmanagers: - static_configs: - targets: ['192.168.80.30:9093'] rule_files: - "/usr/local/prometheus/alert_rules/*.yaml"
systemctl reload prometheus
配置钉钉告警
-
部署钉钉Webhook插件
cd /opt tar xf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz mv prometheus-webhook-dingtalk-2.1.0.linux-amd64 /usr/local/dingtalk cd /usr/local/dingtalk
-
配置钉钉机器人
- 在钉钉群添加自定义机器人,获取Webhook URL和加签密钥。
-
修改插件配置文件
config.yml
targets: webhook1: url: https://oapi.dingtalk.com/robot/send?access_token=your_token secret: your_secret
-
启动钉钉服务
./prometheus-webhook-dingtalk &
-
修改Alertmanager配置
route: receiver: 'dingding.webhook1' receivers: - name: 'dingding.webhook1' webhook_configs: - url: 'http://192.168.80.30:8060/dingtalk/webhook1/send' send_resolved: true
systemctl reload alertmanager
测试告警
-
触发实例宕机告警
systemctl stop node_exporter
- 等待1分钟后检查邮件和钉钉群消息。
-
验证告警状态
- 访问Prometheus界面:
http://<Prometheus-IP>:9090/alerts
- 查看Alertmanager界面:
http://<Alertmanager-IP>:9093
- 访问Prometheus界面:
常见问题排查~
-
邮件未收到
- 检查SMTP配置(端口、TLS、授权码)。
- 查看Alertmanager日志:
journalctl -u alertmanager -f
-
钉钉告警失败
- 确认Webhook URL和密钥正确。
- 检查钉钉插件日志:
tail -f /usr/local/dingtalk/prometheus-webhook-dingtalk.log
-
告警规则未加载
- 确认Prometheus配置文件路径正确。
- 重启Prometheus:
systemctl reload prometheus