prometheus AlertManager 微信報警配置

發布時間：2020-07-12 14:27:24 來源：網絡閱讀：1688 作者：laihuadongcto 欄目：系統運維

準備工作：

獲取企業×××的對外接口

企業×××的secret_api

企業信息ID corp_id

wechat_api_url: wechat對外接口https://qyapi.weixin.qq.com/cgi-bin/
wechat_×××: 企業×××("企業應用"-->"自定應用"[Prometheus]--> "Secret") Prometheus是本人自創建應用名稱
wechat_api_corp_id: 企業信息("我的企業"--->"CorpID"[在底部])
to_party: 1值是組的ID 你可通過鏈接去定制報警信息接收人或者組(https://work.weixin.qq.com/ap...
agent_id: 企業×××("企業應用"-->"自定應用"[Prometheus]--> "AgentId") Prometheus是本人自創建應用名稱

如果prometheus和alertmanager的配置文件是分開（不是helm安裝）

Prometheus中AlertManager配置：

alerting?與?global同級

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093

rules配置文件加入到Prometheus配置文件中

rule_files:
- "/usr/local/prometheus/rules.yml"

prometheus rules配置
創建rule.yml文件
根據需求添加報警規則
groups:
- name: prometheus_go_goroutines
rules:
- alert: go_goroutines_numbers
expr: go_goroutines > 45
for: 15s
annotations:
summary: "prometheus的gorotine數據超過40!"

Prometheus AlertManager配置
alertmanager 配置文件，加入×××配置信息
global:
resolve_timeout: 2m
wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
wechat_×××: 'xxx'
wechat_api_corp_id: 'xxx'

route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'wechat'
receivers:
- name: 'wechat'
wechat_configs:
- send_resolved: true
to_party: '1'
agent_id: '1000002'

===================================================================

如果是用helm安裝的，那么我們的promethues和alertmanager的配置是在一個文件中的

vim ?prometheus-operator-custom.yaml ? # 修改配置

alertmanager: 下配置
config:
global:
# 每2分鐘檢查一次是否恢復
resolve_timeout: 3m
templates:
- '/etc/alertmanager/config/template_wechat.tmpl'
route:
# 將傳入的報警中有這些標簽的分為一個組.
group_by: ['wechat_alert']
# 指分組創建多久后才可以發送壓縮的警報，也就是初次發警報的延時.
# # 這樣會確保第一次通知的時候, 有更多的報警被壓縮在一起.
group_wait: 15s
# 當第一個通知發送，等待多久發送壓縮的警報
group_interval: 15s
# 如果報警發送成功, 等待多久重新發送一次
repeat_interval: 3m
receiver: 'wechat'
routes:
- receiver: 'wechat'
continue: true
receivers:
- name: 'wechat'
wechat_configs:
# 是否發送恢復告警
- send_resolved: true
# ×××公眾號ID
corp_id: 'XXX'
# ×××應用密鑰
×××: 'XXX'
# 可發送的用戶名可以多個?
#to_user: '@all'
# 部門ID 點擊部門的時候右下角的彈窗可以看到比較隱蔽
to_party: '92'
agent_id: '1000010'
# 模板格式：
templateFiles:
template_wechat.tmpl: |-
{{ define "wechat.default.message" }}
{{ range .Alerts }}
=====start======
告警程序: k8s_prometheus_alert
告警級別: {{ .Labels.severity }}
告警類型: {{ .Labels.alertname }}
故障主機: {{ .Labels.name?}}
告警閾值: {{ .Annotations.value }}
告警主題: {{ .Annotations.summary }}
# 時間默認UTC 所以后邊加入28800e9 也就是多了啦8個小時?
觸發時間: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
======end======
{{ end }}
{{ end }}

報警規則配置文件：
helm安裝配合文件合并，所以報警規則獨立出一個文件，加載時多加載一個文件即可。

vim ?rules-custom.yaml ?# 編輯規則文件

additionalPrometheusRules:
- name: cpu1
groups:
- name: cpu load
rules:
- alert: pod cpu 超過1%
expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 30
for: 1m
labels:
severity: critical
annotations:
value: "{{ $value }}"
description: The configuration of the instances of the Alertmanager cluster`{{$labels.service}}` are out of sync.
summary: "這是第一個組的第一個測試 OK"
# - alert: pod memcache 超過1%
# expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 5
# for: 5m
# labels:
# severity: critical
# annotations:
# description: An unexpected number of Alertmanagers are scraped or Alertmanagers disappeared from discovery.
# summary: "這是第一個組的第二個測試"
- name: cpu2
groups:
- name: node load
rules:
- alert: 另一個group pod 超過 1%
expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 30
for: 1m
labels:
severity: critical
annotations:
value: "{{ $value }}"
summary: "這是第二個組的測試 ok"

最后我們在加載的時候只需多加載一個配置文件：

可同時加載兩個配置：
helm upgrade monitoring stable/prometheus-operator --version=5.0.3 --namespace=monitoring -f prometheus-operator-custom.yaml -f rules-custom.yaml

向AI問一下細節

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

prometheus AlertManager 微信報警配置

猜你喜歡

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

prometheus AlertManager 微信報警配置

猜你喜歡

最新資訊

相關推薦

相關標簽