mirror of
https://github.com/EZ-Api/ez-api.git
synced 2026-01-13 17:47:51 +00:00
feat(cron): add automatic alert detector for anomaly monitoring
Implement AlertDetector background task that runs every minute to detect and create alerts for various anomalies: - Rate limit detection: monitors masters hitting rate limits - Error spike detection: flags keys with >= 10% error rate - Quota exceeded: warns when key quota usage >= 90% - Provider down: alerts when API keys have >= 50% failure rate Includes fingerprint-based deduplication with 5-minute cooldown to prevent duplicate alerts for the same issue.
This commit is contained in:
27
docs/api.md
27
docs/api.md
@@ -524,6 +524,33 @@ curl -X POST http://localhost:8080/internal/alerts/report \
|
||||
}
|
||||
```
|
||||
|
||||
### 6.6 自动告警检测 (AlertDetector)
|
||||
|
||||
CP 侧运行后台任务,每分钟自动检测异常并生成告警。
|
||||
|
||||
**检测规则**:
|
||||
|
||||
| 规则 | 类型 | 严重性 | 说明 |
|
||||
| :--- | :--- | :--- | :--- |
|
||||
| 速率限制 | `rate_limit` | warning | 检测 Redis 中被限流的 Master |
|
||||
| 错误飙升 | `error_spike` | info/warning/critical | 近 5 分钟错误率 >= 10%(>=50% 为 critical) |
|
||||
| 配额超限 | `quota_exceeded` | warning/critical | Key 配额使用 >= 90%(达到 100% 为 critical) |
|
||||
| 上游故障 | `provider_down` | critical | API Key 失败率 >= 50% 且失败次数 >= 10 |
|
||||
|
||||
**去重机制**:
|
||||
- 基于 `fingerprint`(`type:related_type:related_id`)去重
|
||||
- 5 分钟内同一 fingerprint 的活跃告警不重复创建
|
||||
|
||||
**配置默认值**:
|
||||
```go
|
||||
Interval: 1 * time.Minute // 检测间隔
|
||||
ErrorSpikeThreshold: 0.1 // 错误率阈值 (10%)
|
||||
ErrorSpikeWindow: 5 * time.Minute // 错误统计窗口
|
||||
QuotaWarningThreshold: 0.9 // 配额告警阈值 (90%)
|
||||
ProviderFailThreshold: 10 // 上游失败次数阈值
|
||||
DeduplicationCooldown: 5 * time.Minute // 去重冷却期
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. 备注
|
||||
|
||||
Reference in New Issue
Block a user