Commit Graph

120 Commits

Author SHA1 Message Date
zenfun
b1fe1ecb97 docs(admin): add dashboard integration documentation
Add detailed specification for the EZ-API Control Plane Dashboard
frontend-backend integration. This document defines the data mapping,
API endpoints, and UI logic for global navigation, real-time metrics,
traffic analysis, and alert summaries.
2026-01-02 21:40:55 +08:00
zenfun
bfb19ca23e test(api): add validation tests for traffic chart endpoint
Add TestTrafficChart_MinuteGranularityValidation to verify input
parameters including granularity, time range limits, and top_n
constraints. Include skipped placeholders for PostgreSQL-specific
aggregation tests.
2026-01-02 21:35:42 +08:00
zenfun
9d082ff375 feat(api): add admin traffic chart statistics endpoint
Add new endpoint GET /admin/logs/stats/traffic-chart to provide
aggregated traffic metrics grouped by time and model. Features include:
- Time granularity selection (hour/minute)
- Top-N model breakdown with "other" aggregation
- Metrics for request counts and token usage

Includes generated Swagger documentation.
2026-01-02 21:24:56 +08:00
zenfun
bae3d9bd5b refactor(scheduler): add base context for graceful shutdown
- Create application-level context with cancel function
- Pass base context to scheduler via WithBaseContext option
- Move scheduler.Stop() to explicit shutdown sequence after context cancellation
- Upgrade foundation dependency to v0.7.0 for new scheduler options
2026-01-01 01:44:49 +08:00
zenfun
31914b9ab5 refactor(scheduler): migrate outbox and model registry to scheduler-based execution
Replace internal goroutine-based timing loops with scheduler integration
for SyncOutboxService and ModelRegistryService. Both services now expose
RunOnce() methods called by the central scheduler instead of managing
their own background loops.

- Add Interval() and RunOnce() methods to SyncOutboxService
- Add RefreshEvery() and RunOnce() methods to ModelRegistryService
- Remove started flag from SyncOutboxService struct
- Move scheduler.Start() after all services are initialized
- Ensure initial model registry refresh before scheduler starts
2026-01-01 00:55:51 +08:00
zenfun
05caed37c2 refactor(cron): migrate cron jobs to foundation scheduler
Replace custom goroutine-based scheduling in cron jobs with centralized
foundation scheduler. Each cron job now exposes a RunOnce method called
by the scheduler instead of managing its own ticker loop.

Changes:
- Remove interval/enabled config from cron job structs
- Convert Start() methods to RunOnce() for all cron jobs
- Add scheduler setup in main.go with configurable intervals
- Update foundation dependency to v0.6.0 for scheduler support
- Update tests to validate RunOnce nil-safety
2025-12-31 20:42:25 +08:00
zenfun
4bcd2b4167 docs(swagger): add generated swagger API documentation
Add auto-generated Swagger/OpenAPI documentation files for the
EZ-API Control Plane. These files are generated by swaggo/swag
and provide comprehensive API documentation including:

- All admin endpoints (alerts, API keys, bindings, masters, etc.)
- Master/tenant endpoints for token management and stats
- Internal endpoints for DP-CP communication
- Authentication schemes (AdminAuth, MasterAuth)
- Request/response schema definitions
2025-12-31 18:01:56 +08:00
zenfun
4cda273f7b feat(alerts): add MasterID to log records and improve traffic spike detection
- Add MasterID field with index to LogRecord model for efficient queries
- Fix threshold config loading to use fixed ID=1 with FirstOrCreate
- Allow traffic spike detection to work without Redis for log-based checks
- Add traffic_spike to API documentation for alert type filter
- Add comprehensive tests for RPM/RPD/TPM spike detection scenarios
2025-12-31 18:01:09 +08:00
zenfun
f714a314a9 test(alerts): add comprehensive tests for alert handler and detector
Add unit tests for alert-related functionality:

- alert_handler_test.go: tests for threshold CRUD operations,
  alert creation with traffic_spike type, filtering, and stats
- alert_detector_test.go: tests for threshold config loading,
  traffic spike severity calculation, deduplication logic,
  error rate severity, and nil-safety checks

Also fix format string issues:
- Use %d instead of %.2f for integer QPS in alert messages
- Wrap error description with format directive to avoid linter warning
2025-12-31 16:09:02 +08:00
zenfun
0b9556ee7e docs(api): add alert thresholds endpoints to swagger documentation
Add OpenAPI documentation for the new alert threshold management
endpoints used in traffic spike detection:

- GET /admin/alerts/thresholds - retrieve current threshold config
- PUT /admin/alerts/thresholds - update threshold configuration

Include AlertThresholdView and UpdateAlertThresholdsRequest schema
definitions with properties for QPS, RPM, TPM, RPD, TPD limits.
2025-12-31 15:57:27 +08:00
zenfun
ba54abd424 feat(alerts): add traffic spike detection with configurable thresholds
Introduce traffic_spike alert type for monitoring system and per-master
traffic levels with configurable thresholds stored in database.

- Add AlertThresholdConfig model for persistent threshold configuration
- Implement GET/PUT /admin/alerts/thresholds endpoints for threshold management
- Add traffic spike detection in alert detector cron job:
  - Global QPS monitoring across all masters
  - Per-master RPM/TPM checks with minimum sample thresholds
  - Per-master RPD/TPD checks for daily limits
- Use warning severity at threshold, critical at 2x threshold
- Include metric metadata (value, threshold, window) in alert details
- Update API documentation with new endpoints and alert type
2025-12-31 15:56:17 +08:00
zenfun
85d91cdd2e feat(cron): add automatic alert detector for anomaly monitoring
Implement AlertDetector background task that runs every minute to detect
and create alerts for various anomalies:

- Rate limit detection: monitors masters hitting rate limits
- Error spike detection: flags keys with >= 10% error rate
- Quota exceeded: warns when key quota usage >= 90%
- Provider down: alerts when API keys have >= 50% failure rate

Includes fingerprint-based deduplication with 5-minute cooldown to
prevent duplicate alerts for the same issue.
2025-12-31 14:49:51 +08:00
zenfun
6cab7e257a docs(admin): update dashboard and operations API references
- Add dashboard alerts, realtime, and apikey-stats endpoints
- Document time range parameters for logs and apikey-stats
- Update daily operations workflow with new monitoring endpoints
- Clarify period parameter behavior (default returns all data)
2025-12-31 14:38:07 +08:00
zenfun
dab07caca2 docs(api): add apikey-stats time range and internal alerts report endpoints
Document two new API endpoints:
- GET /admin/apikey-stats/summary with optional since/until params
  for querying statistics within a specific time range
- POST /internal/alerts/report for Data Plane to report alerts
  with fingerprint-based deduplication mechanism
2025-12-31 14:25:55 +08:00
zenfun
57797f38cb docs(api): add alerts report endpoint and apikey-stats query params
- Add internal /internal/alerts/report POST endpoint documentation
- Add reportAlertEntry, reportAlertsRequest, reportAlertsResponse schemas
- Add since/until query parameters to /admin/apikey-stats/summary endpoint
2025-12-31 14:18:42 +08:00
zenfun
bfba16bbd4 feat(api): add internal alerts reporting endpoint with deduplication
Add ReportAlerts endpoint for Data Plane to report alerts to Control Plane
with fingerprint-based deduplication using a 5-minute cooldown period.

Changes:
- Add POST /internal/alerts/report endpoint with validation
- Add Fingerprint field to Alert model for deduplication
- Extend GetAPIKeyStatsSummary with optional time range filtering
  using since/until query parameters to query from log records
2025-12-31 14:18:09 +08:00
zenfun
71f7578c7b docs(api): add dashboard statistics and alert management API documentation
Add comprehensive API documentation for new admin endpoints:

- Dashboard summary endpoint with period/time range parameters
- System-level realtime statistics (QPS, RPM, rate limits)
- Log stats aggregation by hour/minute with time constraints
- API key status filtering (active/suspended/disabled)
- Complete alert management system documentation:
  - Alert types, severity levels, and status definitions
  - CRUD endpoints for alert lifecycle management
  - Alert statistics endpoint
2025-12-31 13:48:30 +08:00
zenfun
170d16894f docs: update swagger docs for alerts API and minute-level stats
Add OpenAPI documentation for the new alerts management system:
- CRUD endpoints for system alerts (/admin/alerts)
- Alert acknowledgment and resolution endpoints
- Alert statistics endpoint
- Alert filtering by status, severity, and type

Also document minute-level aggregation support for log stats
with 6-hour time range limitation.
2025-12-31 13:45:18 +08:00
zenfun
2b5e657b3d feat(api): add alert system with CRUD endpoints and statistics
Introduce a comprehensive alert management system for monitoring
system events and notifications.

Changes include:
- Add Alert model with type, severity, status, and metadata fields
- Implement AlertHandler with full CRUD operations (create, list,
  get, acknowledge, resolve, dismiss)
- Add alert statistics endpoint for counts by status and severity
- Register Alert model in database auto-migration
- Add minute-level aggregation to log stats (limited to 6-hour range)
2025-12-31 13:43:48 +08:00
zenfun
53c18c3867 feat(api): add dashboard summary and system realtime endpoints
Add new admin API endpoints for dashboard metrics and system-wide
realtime statistics:

- Add /admin/dashboard/summary endpoint with aggregated metrics
  including requests, tokens, latency, masters, keys, and provider
  keys statistics with time period filtering
- Add /admin/realtime endpoint for system-level realtime stats
  aggregated across all masters
- Add status filter parameter to ListAPIKeys endpoint
- Add hour grouping option to log stats aggregation
- Update OpenAPI documentation with new endpoints and schemas
2025-12-31 13:17:23 +08:00
zenfun
1a2cc5b798 feat(api): add API key stats flush and summary endpoints
Introduce internal endpoint for flushing accumulated APIKey statistics
from data plane to control plane database, updating both individual
API keys and their parent provider groups with request counts and
success/failure rates.

Add admin endpoint to retrieve aggregated API key statistics summary
across all provider groups, including total requests, success/failure
counts, and calculated rates.
2025-12-30 00:11:52 +08:00
zenfun
5156ca9cec feat(model): add request statistics fields to ProviderGroup and APIKey 2025-12-29 23:40:01 +08:00
76304335f7 chore: update foundation version 2025-12-29 08:46:44 +08:00
zenfun
6170931454 feat(cron): add OAuth token refresh background job
Implement automatic token refresh mechanism for CPA providers (Codex,
GeminiCLI, Antigravity, ClaudeCode) with the following features:

- Periodic refresh of expiring tokens based on configurable interval
- Redis event queue processing for on-demand token refresh
- Retry logic with exponential backoff for transient failures
- Automatic key deactivation on non-retryable errors
- Provider-specific OAuth token refresh implementations
- Sync service integration to update providers after refresh
2025-12-28 03:03:19 +08:00
zenfun
f0fe9f0dad feat(api): add OAuth token fields and new provider types support
Add support for OAuth-based authentication with access/refresh tokens
and expiration tracking for API keys. Extend provider groups with
static headers configuration and headers profile options.

Changes include:
- Add AccessToken, RefreshToken, ExpiresAt, AccountID, ProjectID to APIKey model
- Add StaticHeaders and HeadersProfile to ProviderGroup model
- Add TokenRefresh configuration for background token management
- Support new provider types: ClaudeCode, Codex, GeminiCLI, Antigravity
- Update sync service to include new fields in provider snapshots
2025-12-28 02:49:54 +08:00
zenfun
cca0802620 docs(swagger): update dp_claude_cross_upstream description to include Google-family providers
Clarify that the dp_claude_cross_upstream feature flag controls routing
of Claude protocol requests to both OpenAI-compatible and Google-family
upstream providers.
2025-12-27 20:06:50 +08:00
zenfun
6b07cd58ca chore: add openspec, AGENTS.md and .kilocode to gitignore 2025-12-27 19:10:50 +08:00
zenfun
f51cd63c82 test(api): remove health handler test file 2025-12-27 13:51:25 +08:00
zenfun
637bfa8210 feat(api): add public status endpoints with version injection
Replace health_handler with status_handler providing public /status and
/about endpoints. Add build-time version injection via ldflags in
Makefile, and support --version/-v CLI flag.

- Add /status endpoint returning runtime status, uptime, and version
- Add /about endpoint with system metadata (name, description, repo)
- Configure VERSION variable with git describe fallback
- Update swagger docs and api.md for new public endpoints
- Remove deprecated /api/status/test endpoint
2025-12-27 13:24:13 +08:00
zenfun
3d39c591fd fix(sync): use nanosecond precision for bindings version to ensure uniqueness
Use UnixNano for version field while keeping Unix seconds for updated_at
timestamp. This ensures version changes are detected even when multiple
syncs occur within the same second.
2025-12-26 11:20:53 +08:00
zenfun
c83fe03892 docs(swagger): update API documentation for log endpoints
Add new response types and parameters for logs API:
- Add GroupedStatsItem and GroupedStatsResponse definitions
- Add ListLogsResponse and LogView definitions for detailed log records
- Add group_by enum parameter (model/day/month) to stats endpoint
- Update endpoint descriptions to clarify response types and request_body inclusion
- Update response schema references to use correct types
2025-12-26 11:12:47 +08:00
zenfun
3f1d006006 feat(api): add grouped statistics and request_body to log endpoints
Add group_by parameter to /admin/logs/stats endpoint supporting:
- group_by=model: aggregate stats per model with avg latency
- group_by=day: daily aggregation with token counts
- group_by=month: monthly aggregation with token counts

Also include request_body field in admin ListLogs response for
full visibility into logged requests.
2025-12-26 11:06:39 +08:00
30f15a84b5 feat(api): add /auth/whoami endpoint and build automation 2025-12-25 14:54:52 +08:00
8e6d86edd7 feat(api): add /auth/whoami endpoint for identity detection 2025-12-25 14:41:38 +08:00
b566eb8058 fix(swagger): restore apikey security definition with Bearer usage description 2025-12-25 11:32:55 +08:00
c8fced4cf1 fix(swagger): fix swagger authorization header 2025-12-25 11:26:56 +08:00
ecdff89364 feat(k8s): add missing env configs for outbox and timeout 2025-12-25 11:14:02 +08:00
6473ac1c81 feat: update kubernetes deployment 2025-12-25 11:09:32 +08:00
9a07e0f468 feat: update kubernetes test environment configurations 2025-12-25 11:02:37 +08:00
41998a3584 feat(swagger): support dynamic host via EZ_SWAGGER_HOST env var
- Add ServerConfig.SwaggerHost field
- Set docs.SwaggerInfo.Host dynamically at startup
- Update docker-compose, k8s manifests, and .env.example
2025-12-25 10:59:57 +08:00
zenfun
6a16712b9d feat(core): implement sync outbox mechanism and refactor provider validation
- Introduce `SyncOutboxService` and model to retry failed CP-to-Redis sync operations
- Update `SyncService` to handle sync failures by enqueuing tasks to the outbox
- Centralize provider group and API key validation logic into `ProviderGroupManager`
- Refactor API handlers to utilize the new manager and robust sync methods
- Add configuration options for sync outbox (interval, batch size, retries)
2025-12-25 01:24:19 +08:00
zenfun
44a82fa252 delete: remove png 2025-12-24 17:01:56 +08:00
zenfun
72d7920534 doc: add admin panel map 2025-12-24 15:12:06 +08:00
zenfun
38d2329991 doc: 增加 .env 的注释 2025-12-24 14:51:57 +08:00
7f160a8d2a feat: 添加kubernetes部署 2025-12-24 11:49:34 +08:00
b97705532e chore:update foundation version 2025-12-24 11:31:49 +08:00
zenfun
dff2fc7562 chore(test): clean up unused code and fix test helper usage
- Update TestBatchBindings_Status to use newTestHandlerWithRedis helper
- Remove unused jsonID helper function from sync_bindings_spec_test.go
- Add time import to models.go
2025-12-24 02:27:11 +08:00
zenfun
d731e42ae5 docs: update swagger specs for provider-groups and api-keys refactor
Regenerate OpenAPI documentation to reflect the architectural split
of Provider into ProviderGroup and APIKey models:

- Replace /admin/providers endpoints with /admin/provider-groups
- Add new /admin/api-keys CRUD and batch endpoints
- Update BindingDTO to use group_id instead of route_group
- Remove provider-specific DTOs (ProviderCustomCreateDTO, ProviderPresetCreateDTO, ProviderGoogleCreateDTO)
- Add APIKeyDTO and ProviderGroupDTO definitions
- Update model definitions for APIKey, Binding, and ProviderGroup
2025-12-24 02:20:32 +08:00
zenfun
dea8363e41 refactor(api): split Provider into ProviderGroup and APIKey models
Restructure the provider management system by separating the monolithic
Provider model into two distinct entities:

- ProviderGroup: defines shared upstream configuration (type, base_url,
  google settings, models, status)
- APIKey: represents individual credentials within a group (api_key,
  weight, status, auto_ban, ban settings)

This change also updates:
- Binding model to reference GroupID instead of RouteGroup string
- All CRUD handlers for the new provider-group and api-key endpoints
- Sync service to rebuild provider snapshots from joined tables
- Model registry to aggregate capabilities across group/key pairs
- Access handler to validate namespace existence and subset constraints
- Migration importer to handle the new schema structure
- All related tests to use the new model relationships

BREAKING CHANGE: Provider API endpoints replaced with /provider-groups
and /api-keys endpoints; Binding.RouteGroup replaced with Binding.GroupID
2025-12-24 02:15:52 +08:00
zenfun
cd5616dc26 feat(migrate): add import CLI command and importer for migration data
Introduce a new `import` subcommand to the server binary that reads
exported JSON files and imports masters, providers, keys, bindings,
and namespaces into the database.

Key features:
- Support for dry-run mode to validate without writing
- Conflict policies: skip existing or overwrite
- Optional binding import via --include-bindings flag
- Auto-generation of master keys with secure hashing
- Namespace auto-creation for referenced namespaces
- Detailed import summary with warnings and created credentials
2025-12-23 20:13:45 +08:00