Add OpenAPI documentation for the new alert threshold management
endpoints used in traffic spike detection:
- GET /admin/alerts/thresholds - retrieve current threshold config
- PUT /admin/alerts/thresholds - update threshold configuration
Include AlertThresholdView and UpdateAlertThresholdsRequest schema
definitions with properties for QPS, RPM, TPM, RPD, TPD limits.
Introduce traffic_spike alert type for monitoring system and per-master
traffic levels with configurable thresholds stored in database.
- Add AlertThresholdConfig model for persistent threshold configuration
- Implement GET/PUT /admin/alerts/thresholds endpoints for threshold management
- Add traffic spike detection in alert detector cron job:
- Global QPS monitoring across all masters
- Per-master RPM/TPM checks with minimum sample thresholds
- Per-master RPD/TPD checks for daily limits
- Use warning severity at threshold, critical at 2x threshold
- Include metric metadata (value, threshold, window) in alert details
- Update API documentation with new endpoints and alert type
Implement AlertDetector background task that runs every minute to detect
and create alerts for various anomalies:
- Rate limit detection: monitors masters hitting rate limits
- Error spike detection: flags keys with >= 10% error rate
- Quota exceeded: warns when key quota usage >= 90%
- Provider down: alerts when API keys have >= 50% failure rate
Includes fingerprint-based deduplication with 5-minute cooldown to
prevent duplicate alerts for the same issue.
- Add dashboard alerts, realtime, and apikey-stats endpoints
- Document time range parameters for logs and apikey-stats
- Update daily operations workflow with new monitoring endpoints
- Clarify period parameter behavior (default returns all data)
Document two new API endpoints:
- GET /admin/apikey-stats/summary with optional since/until params
for querying statistics within a specific time range
- POST /internal/alerts/report for Data Plane to report alerts
with fingerprint-based deduplication mechanism
Add comprehensive API documentation for new admin endpoints:
- Dashboard summary endpoint with period/time range parameters
- System-level realtime statistics (QPS, RPM, rate limits)
- Log stats aggregation by hour/minute with time constraints
- API key status filtering (active/suspended/disabled)
- Complete alert management system documentation:
- Alert types, severity levels, and status definitions
- CRUD endpoints for alert lifecycle management
- Alert statistics endpoint
Add OpenAPI documentation for the new alerts management system:
- CRUD endpoints for system alerts (/admin/alerts)
- Alert acknowledgment and resolution endpoints
- Alert statistics endpoint
- Alert filtering by status, severity, and type
Also document minute-level aggregation support for log stats
with 6-hour time range limitation.
Add new admin API endpoints for dashboard metrics and system-wide
realtime statistics:
- Add /admin/dashboard/summary endpoint with aggregated metrics
including requests, tokens, latency, masters, keys, and provider
keys statistics with time period filtering
- Add /admin/realtime endpoint for system-level realtime stats
aggregated across all masters
- Add status filter parameter to ListAPIKeys endpoint
- Add hour grouping option to log stats aggregation
- Update OpenAPI documentation with new endpoints and schemas
Clarify that the dp_claude_cross_upstream feature flag controls routing
of Claude protocol requests to both OpenAI-compatible and Google-family
upstream providers.
Replace health_handler with status_handler providing public /status and
/about endpoints. Add build-time version injection via ldflags in
Makefile, and support --version/-v CLI flag.
- Add /status endpoint returning runtime status, uptime, and version
- Add /about endpoint with system metadata (name, description, repo)
- Configure VERSION variable with git describe fallback
- Update swagger docs and api.md for new public endpoints
- Remove deprecated /api/status/test endpoint
Add new response types and parameters for logs API:
- Add GroupedStatsItem and GroupedStatsResponse definitions
- Add ListLogsResponse and LogView definitions for detailed log records
- Add group_by enum parameter (model/day/month) to stats endpoint
- Update endpoint descriptions to clarify response types and request_body inclusion
- Update response schema references to use correct types
Regenerate OpenAPI documentation to reflect the architectural split
of Provider into ProviderGroup and APIKey models:
- Replace /admin/providers endpoints with /admin/provider-groups
- Add new /admin/api-keys CRUD and batch endpoints
- Update BindingDTO to use group_id instead of route_group
- Remove provider-specific DTOs (ProviderCustomCreateDTO, ProviderPresetCreateDTO, ProviderGoogleCreateDTO)
- Add APIKeyDTO and ProviderGroupDTO definitions
- Update model definitions for APIKey, Binding, and ProviderGroup
Rewrite docs/api.md to provide a more structured overview of business
logic, core models, and authentication mechanisms. Include detailed
cURL examples for typical operations and add a new management
relationship diagram asset.
- Significant rewrite of docs/api.md with better formatting and content
- Add mermaid diagram for resource relationships
- Update README.md to reference the expanded documentation
- Add docs/管理关系图.png asset
Reflect recent API changes in Swagger documentation:
- Add endpoints for feature flag management (/admin/features)
- Add endpoint for issuing child keys to masters (/admin/masters/{id}/keys)
- Add endpoint for updating providers (/admin/providers/{id})
- Update provider model definitions with new fields (auto_ban, google_* attributes, status)
- Added Swagger documentation for the following admin endpoints:
- Create a new master tenant
- Create a new provider
- Register a new model
- List all models
- Update a model
- Force sync snapshot
- Ingest logs
- Added Swagger documentation for the master endpoint:
- Issue a child key
- Updated go.mod and go.sum to include necessary dependencies for Swagger.