Commit Graph

3 Commits

Author SHA1 Message Date
zenfun
ba54abd424 feat(alerts): add traffic spike detection with configurable thresholds
Introduce traffic_spike alert type for monitoring system and per-master
traffic levels with configurable thresholds stored in database.

- Add AlertThresholdConfig model for persistent threshold configuration
- Implement GET/PUT /admin/alerts/thresholds endpoints for threshold management
- Add traffic spike detection in alert detector cron job:
  - Global QPS monitoring across all masters
  - Per-master RPM/TPM checks with minimum sample thresholds
  - Per-master RPD/TPD checks for daily limits
- Use warning severity at threshold, critical at 2x threshold
- Include metric metadata (value, threshold, window) in alert details
- Update API documentation with new endpoints and alert type
2025-12-31 15:56:17 +08:00
zenfun
bfba16bbd4 feat(api): add internal alerts reporting endpoint with deduplication
Add ReportAlerts endpoint for Data Plane to report alerts to Control Plane
with fingerprint-based deduplication using a 5-minute cooldown period.

Changes:
- Add POST /internal/alerts/report endpoint with validation
- Add Fingerprint field to Alert model for deduplication
- Extend GetAPIKeyStatsSummary with optional time range filtering
  using since/until query parameters to query from log records
2025-12-31 14:18:09 +08:00
zenfun
2b5e657b3d feat(api): add alert system with CRUD endpoints and statistics
Introduce a comprehensive alert management system for monitoring
system events and notifications.

Changes include:
- Add Alert model with type, severity, status, and metadata fields
- Implement AlertHandler with full CRUD operations (create, list,
  get, acknowledge, resolve, dismiss)
- Add alert statistics endpoint for counts by status and severity
- Register Alert model in database auto-migration
- Add minute-level aggregation to log stats (limited to 6-hour range)
2025-12-31 13:43:48 +08:00