Commit Graph

10 Commits

Author SHA1 Message Date
zenfun
5b2b176a55 feat(stats): add daily statistics aggregation job and model
- Create `DailyStat` model for immutable daily metrics including
  request counts, tokens, latency, and top models.
- Implement `DailyStatsJob` to aggregate `log_records` from the previous
  day, running daily at 00:05 UTC.
- Register database migrations and schedule the job in the server.
- Add `last7d` and `last30d` period support to stats handler.
2026-01-02 22:20:37 +08:00
zenfun
05caed37c2 refactor(cron): migrate cron jobs to foundation scheduler
Replace custom goroutine-based scheduling in cron jobs with centralized
foundation scheduler. Each cron job now exposes a RunOnce method called
by the scheduler instead of managing its own ticker loop.

Changes:
- Remove interval/enabled config from cron job structs
- Convert Start() methods to RunOnce() for all cron jobs
- Add scheduler setup in main.go with configurable intervals
- Update foundation dependency to v0.6.0 for scheduler support
- Update tests to validate RunOnce nil-safety
2025-12-31 20:42:25 +08:00
zenfun
4cda273f7b feat(alerts): add MasterID to log records and improve traffic spike detection
- Add MasterID field with index to LogRecord model for efficient queries
- Fix threshold config loading to use fixed ID=1 with FirstOrCreate
- Allow traffic spike detection to work without Redis for log-based checks
- Add traffic_spike to API documentation for alert type filter
- Add comprehensive tests for RPM/RPD/TPM spike detection scenarios
2025-12-31 18:01:09 +08:00
zenfun
f714a314a9 test(alerts): add comprehensive tests for alert handler and detector
Add unit tests for alert-related functionality:

- alert_handler_test.go: tests for threshold CRUD operations,
  alert creation with traffic_spike type, filtering, and stats
- alert_detector_test.go: tests for threshold config loading,
  traffic spike severity calculation, deduplication logic,
  error rate severity, and nil-safety checks

Also fix format string issues:
- Use %d instead of %.2f for integer QPS in alert messages
- Wrap error description with format directive to avoid linter warning
2025-12-31 16:09:02 +08:00
zenfun
ba54abd424 feat(alerts): add traffic spike detection with configurable thresholds
Introduce traffic_spike alert type for monitoring system and per-master
traffic levels with configurable thresholds stored in database.

- Add AlertThresholdConfig model for persistent threshold configuration
- Implement GET/PUT /admin/alerts/thresholds endpoints for threshold management
- Add traffic spike detection in alert detector cron job:
  - Global QPS monitoring across all masters
  - Per-master RPM/TPM checks with minimum sample thresholds
  - Per-master RPD/TPD checks for daily limits
- Use warning severity at threshold, critical at 2x threshold
- Include metric metadata (value, threshold, window) in alert details
- Update API documentation with new endpoints and alert type
2025-12-31 15:56:17 +08:00
zenfun
85d91cdd2e feat(cron): add automatic alert detector for anomaly monitoring
Implement AlertDetector background task that runs every minute to detect
and create alerts for various anomalies:

- Rate limit detection: monitors masters hitting rate limits
- Error spike detection: flags keys with >= 10% error rate
- Quota exceeded: warns when key quota usage >= 90%
- Provider down: alerts when API keys have >= 50% failure rate

Includes fingerprint-based deduplication with 5-minute cooldown to
prevent duplicate alerts for the same issue.
2025-12-31 14:49:51 +08:00
zenfun
6170931454 feat(cron): add OAuth token refresh background job
Implement automatic token refresh mechanism for CPA providers (Codex,
GeminiCLI, Antigravity, ClaudeCode) with the following features:

- Periodic refresh of expiring tokens based on configurable interval
- Redis event queue processing for on-demand token refresh
- Retry logic with exponential backoff for transient failures
- Automatic key deactivation on non-retryable errors
- Provider-specific OAuth token refresh implementations
- Sync service integration to update providers after refresh
2025-12-28 03:03:19 +08:00
zenfun
816ea93339 feat(arch): add log partitioning and provider delete sync 2025-12-21 20:45:16 +08:00
zenfun
25795a79d6 feat(cron): add automatic log cleanup with retention policy
Implement LogCleaner cron job to automatically clean up old log records
based on configurable retention period and maximum record count.

- Add LogCleaner with retention_days and max_records configuration
- Add EZ_LOG_RETENTION_DAYS and EZ_LOG_MAX_RECORDS environment variables
- Default to 30 days retention and 1,000,000 max records
- Include unit tests for log cleaner functionality
2025-12-21 12:01:52 +08:00
zenfun
ac9f0cd0a7 feat(stats): add usage stats and quota reset 2025-12-19 21:50:28 +08:00