65 Commits

Author SHA1 Message Date
7929b2a872 refactor(api): rename auth/whoami to users/info and simplify response
Rename endpoint from /auth/whoami to /users/info to align with
ez-contract schema. Simplify WhoamiResponse by removing:
- Realtime stats (requests, tokens, qps, rate_limited)
- Key-specific fields (allow_ips, deny_ips, expires_at, model_limits,
  quota fields, usage stats)

The endpoint now returns only essential identity information as
defined in ez-contract/schemas/auth/auth.yaml.

BREAKING CHANGE: /auth/whoami endpoint moved to /users/info with
reduced response fields
2026-01-13 15:57:52 +08:00
zenfun
e7db9f319f fix: delete keys and seed only active ones
Ensure admin key deletion removes the DB record and returns a
"deleted" status. Update seeder idempotency to count only active keys
when deciding whether to skip or create new keys.
2026-01-10 01:18:04 +08:00
zenfun
5349c9c833 feat(api): add admin master key listing/revoke
Add admin endpoints to list and revoke child keys under a master.
Standardize OpenAPI responses to use ResponseEnvelope with MapData
for error payloads, and regenerate swagger specs accordingly.
2026-01-10 01:10:36 +08:00
zenfun
6af938448e fix(seeder): improve key idempotency and log names
Trim whitespace in provider model lists, format provider names as `group#keyID`
to match DP logs, and skip existing API keys during seeding (deleting on reset)
to keep runs idempotent and summaries accurate
2026-01-10 00:58:02 +08:00
zenfun
5431e24923 fix(seeder): correct log generation fields
- Parse provider group models from API response string and expose as slice
- Send `model` field (not `model_name`) when creating logs
- Use API key ID as `provider_id` instead of provider group ID
- Restrict reset behavior to resources matching seeder tag/prefix
- Refactor usage sample generation to accept a context struct
2026-01-10 00:46:03 +08:00
zenfun
18b9846f83 feat(seeder): add control plane data seeder
Introduce a `cmd/seeder` CLI to generate deterministic demo datasets and
seed them into the Control Plane via admin endpoints, supporting reset,
dry-run, profiles, and usage sample generation.

Add Cobra/Viper dependencies to support the new command.
2026-01-10 00:26:48 +08:00
zenfun
33838b1e2c feat(api): wrap JSON responses in envelope
Add response envelope middleware to standardize JSON responses as
`{code,data,message}` with consistent business codes across endpoints.
Update Swagger annotations and tests to reflect the new response shape.

BREAKING CHANGE: API responses are now wrapped in a response envelope; clients must read payloads from `data` and handle `code`/`message` fields.
2026-01-10 00:15:08 +08:00
1ee6bea413 feat(api): enhance whoami endpoint with realtime stats and extended key info
Add realtime statistics (requests, tokens, QPS, rate limiting) to whoami
response for both master and key authentication types. Extend key response
with additional fields including master name, model limits, quota tracking,
and usage statistics.

- Inject StatsService into AuthHandler for realtime stats retrieval
- Add WhoamiRealtimeView struct for realtime statistics
- Include admin permissions field in admin response
- Add comprehensive key metadata (quotas, model limits, usage stats)
- Add test for expired key returning 401 Unauthorized
2026-01-06 09:15:49 +08:00
zenfun
a7571dd4ad feat(server): integrate ip ban cron and refine updates
- Initialize and schedule IP ban maintenance tasks in server entry point
- Perform initial IP ban sync to Redis on startup
- Implement optional JSON unmarshalling to handle null `expires_at` in API
- Add CIDR overlap validation when updating rule status to active
2026-01-04 01:44:45 +08:00
zenfun
63d43db39d feat(server): register IP ban routes in admin group
Initialize the IP ban service and handler, and wire up the CRUD
endpoints to the admin router group.
2026-01-04 01:26:55 +08:00
zenfun
ae2f4d7819 feat(model): add IPBan entity for global IP blocking
Introduces the IPBan model to support global IP/CIDR ban rules enforced by the data plane. Includes fields for CIDR, status, expiration, and hit counts, and registers the model for auto-migration in the server startup.
2026-01-04 00:55:00 +08:00
zenfun
4cd9b66a84 feat(auth): enhance token validation and internal access control
Refactor the `Whoami` handler to validate token metadata (status, expiration,
revocation) against Redis before database lookup, ensuring consistency with
balancer logic. Add `allow_ips`, `deny_ips`, and `expires_at` fields to
authentication responses.

Update internal middleware to support explicit anonymous access configuration
and harden security for unconfigured tokens.

Remove legacy fallback logic for master keys without digests.

BREAKING CHANGE: Internal endpoints now reject requests by default if no stats token is configured. To allow unauthenticated access, set `internal.allow_anonymous` to true.
BREAKING CHANGE: Support for legacy master keys without stored digests has been removed.
2026-01-03 16:04:04 +08:00
zenfun
481f616704 refactor(stats): remove daily stats aggregation
Remove the DailyStatsJob, DailyStat model, and associated database
migrations. This eliminates the pre-aggregation layer and updates the
dashboard handler to remove dependencies on the daily_stats table.
2026-01-02 23:08:50 +08:00
zenfun
5b2b176a55 feat(stats): add daily statistics aggregation job and model
- Create `DailyStat` model for immutable daily metrics including
  request counts, tokens, latency, and top models.
- Implement `DailyStatsJob` to aggregate `log_records` from the previous
  day, running daily at 00:05 UTC.
- Register database migrations and schedule the job in the server.
- Add `last7d` and `last30d` period support to stats handler.
2026-01-02 22:20:37 +08:00
zenfun
9d082ff375 feat(api): add admin traffic chart statistics endpoint
Add new endpoint GET /admin/logs/stats/traffic-chart to provide
aggregated traffic metrics grouped by time and model. Features include:
- Time granularity selection (hour/minute)
- Top-N model breakdown with "other" aggregation
- Metrics for request counts and token usage

Includes generated Swagger documentation.
2026-01-02 21:24:56 +08:00
zenfun
bae3d9bd5b refactor(scheduler): add base context for graceful shutdown
- Create application-level context with cancel function
- Pass base context to scheduler via WithBaseContext option
- Move scheduler.Stop() to explicit shutdown sequence after context cancellation
- Upgrade foundation dependency to v0.7.0 for new scheduler options
2026-01-01 01:44:49 +08:00
zenfun
31914b9ab5 refactor(scheduler): migrate outbox and model registry to scheduler-based execution
Replace internal goroutine-based timing loops with scheduler integration
for SyncOutboxService and ModelRegistryService. Both services now expose
RunOnce() methods called by the central scheduler instead of managing
their own background loops.

- Add Interval() and RunOnce() methods to SyncOutboxService
- Add RefreshEvery() and RunOnce() methods to ModelRegistryService
- Remove started flag from SyncOutboxService struct
- Move scheduler.Start() after all services are initialized
- Ensure initial model registry refresh before scheduler starts
2026-01-01 00:55:51 +08:00
zenfun
05caed37c2 refactor(cron): migrate cron jobs to foundation scheduler
Replace custom goroutine-based scheduling in cron jobs with centralized
foundation scheduler. Each cron job now exposes a RunOnce method called
by the scheduler instead of managing its own ticker loop.

Changes:
- Remove interval/enabled config from cron job structs
- Convert Start() methods to RunOnce() for all cron jobs
- Add scheduler setup in main.go with configurable intervals
- Update foundation dependency to v0.6.0 for scheduler support
- Update tests to validate RunOnce nil-safety
2025-12-31 20:42:25 +08:00
zenfun
ba54abd424 feat(alerts): add traffic spike detection with configurable thresholds
Introduce traffic_spike alert type for monitoring system and per-master
traffic levels with configurable thresholds stored in database.

- Add AlertThresholdConfig model for persistent threshold configuration
- Implement GET/PUT /admin/alerts/thresholds endpoints for threshold management
- Add traffic spike detection in alert detector cron job:
  - Global QPS monitoring across all masters
  - Per-master RPM/TPM checks with minimum sample thresholds
  - Per-master RPD/TPD checks for daily limits
- Use warning severity at threshold, critical at 2x threshold
- Include metric metadata (value, threshold, window) in alert details
- Update API documentation with new endpoints and alert type
2025-12-31 15:56:17 +08:00
zenfun
85d91cdd2e feat(cron): add automatic alert detector for anomaly monitoring
Implement AlertDetector background task that runs every minute to detect
and create alerts for various anomalies:

- Rate limit detection: monitors masters hitting rate limits
- Error spike detection: flags keys with >= 10% error rate
- Quota exceeded: warns when key quota usage >= 90%
- Provider down: alerts when API keys have >= 50% failure rate

Includes fingerprint-based deduplication with 5-minute cooldown to
prevent duplicate alerts for the same issue.
2025-12-31 14:49:51 +08:00
zenfun
bfba16bbd4 feat(api): add internal alerts reporting endpoint with deduplication
Add ReportAlerts endpoint for Data Plane to report alerts to Control Plane
with fingerprint-based deduplication using a 5-minute cooldown period.

Changes:
- Add POST /internal/alerts/report endpoint with validation
- Add Fingerprint field to Alert model for deduplication
- Extend GetAPIKeyStatsSummary with optional time range filtering
  using since/until query parameters to query from log records
2025-12-31 14:18:09 +08:00
zenfun
2b5e657b3d feat(api): add alert system with CRUD endpoints and statistics
Introduce a comprehensive alert management system for monitoring
system events and notifications.

Changes include:
- Add Alert model with type, severity, status, and metadata fields
- Implement AlertHandler with full CRUD operations (create, list,
  get, acknowledge, resolve, dismiss)
- Add alert statistics endpoint for counts by status and severity
- Register Alert model in database auto-migration
- Add minute-level aggregation to log stats (limited to 6-hour range)
2025-12-31 13:43:48 +08:00
zenfun
53c18c3867 feat(api): add dashboard summary and system realtime endpoints
Add new admin API endpoints for dashboard metrics and system-wide
realtime statistics:

- Add /admin/dashboard/summary endpoint with aggregated metrics
  including requests, tokens, latency, masters, keys, and provider
  keys statistics with time period filtering
- Add /admin/realtime endpoint for system-level realtime stats
  aggregated across all masters
- Add status filter parameter to ListAPIKeys endpoint
- Add hour grouping option to log stats aggregation
- Update OpenAPI documentation with new endpoints and schemas
2025-12-31 13:17:23 +08:00
zenfun
1a2cc5b798 feat(api): add API key stats flush and summary endpoints
Introduce internal endpoint for flushing accumulated APIKey statistics
from data plane to control plane database, updating both individual
API keys and their parent provider groups with request counts and
success/failure rates.

Add admin endpoint to retrieve aggregated API key statistics summary
across all provider groups, including total requests, success/failure
counts, and calculated rates.
2025-12-30 00:11:52 +08:00
zenfun
6170931454 feat(cron): add OAuth token refresh background job
Implement automatic token refresh mechanism for CPA providers (Codex,
GeminiCLI, Antigravity, ClaudeCode) with the following features:

- Periodic refresh of expiring tokens based on configurable interval
- Redis event queue processing for on-demand token refresh
- Retry logic with exponential backoff for transient failures
- Automatic key deactivation on non-retryable errors
- Provider-specific OAuth token refresh implementations
- Sync service integration to update providers after refresh
2025-12-28 03:03:19 +08:00
zenfun
637bfa8210 feat(api): add public status endpoints with version injection
Replace health_handler with status_handler providing public /status and
/about endpoints. Add build-time version injection via ldflags in
Makefile, and support --version/-v CLI flag.

- Add /status endpoint returning runtime status, uptime, and version
- Add /about endpoint with system metadata (name, description, repo)
- Configure VERSION variable with git describe fallback
- Update swagger docs and api.md for new public endpoints
- Remove deprecated /api/status/test endpoint
2025-12-27 13:24:13 +08:00
8e6d86edd7 feat(api): add /auth/whoami endpoint for identity detection 2025-12-25 14:41:38 +08:00
b566eb8058 fix(swagger): restore apikey security definition with Bearer usage description 2025-12-25 11:32:55 +08:00
c8fced4cf1 fix(swagger): fix swagger authorization header 2025-12-25 11:26:56 +08:00
41998a3584 feat(swagger): support dynamic host via EZ_SWAGGER_HOST env var
- Add ServerConfig.SwaggerHost field
- Set docs.SwaggerInfo.Host dynamically at startup
- Update docker-compose, k8s manifests, and .env.example
2025-12-25 10:59:57 +08:00
zenfun
6a16712b9d feat(core): implement sync outbox mechanism and refactor provider validation
- Introduce `SyncOutboxService` and model to retry failed CP-to-Redis sync operations
- Update `SyncService` to handle sync failures by enqueuing tasks to the outbox
- Centralize provider group and API key validation logic into `ProviderGroupManager`
- Refactor API handlers to utilize the new manager and robust sync methods
- Add configuration options for sync outbox (interval, batch size, retries)
2025-12-25 01:24:19 +08:00
zenfun
dea8363e41 refactor(api): split Provider into ProviderGroup and APIKey models
Restructure the provider management system by separating the monolithic
Provider model into two distinct entities:

- ProviderGroup: defines shared upstream configuration (type, base_url,
  google settings, models, status)
- APIKey: represents individual credentials within a group (api_key,
  weight, status, auto_ban, ban settings)

This change also updates:
- Binding model to reference GroupID instead of RouteGroup string
- All CRUD handlers for the new provider-group and api-key endpoints
- Sync service to rebuild provider snapshots from joined tables
- Model registry to aggregate capabilities across group/key pairs
- Access handler to validate namespace existence and subset constraints
- Migration importer to handle the new schema structure
- All related tests to use the new model relationships

BREAKING CHANGE: Provider API endpoints replaced with /provider-groups
and /api-keys endpoints; Binding.RouteGroup replaced with Binding.GroupID
2025-12-24 02:15:52 +08:00
zenfun
cd5616dc26 feat(migrate): add import CLI command and importer for migration data
Introduce a new `import` subcommand to the server binary that reads
exported JSON files and imports masters, providers, keys, bindings,
and namespaces into the database.

Key features:
- Support for dry-run mode to validate without writing
- Conflict policies: skip existing or overwrite
- Optional binding import via --include-bindings flag
- Auto-generation of master keys with secure hashing
- Namespace auto-creation for referenced namespaces
- Detailed import summary with warnings and created credentials
2025-12-23 20:13:45 +08:00
zenfun
2c5ccd56ee feat(api): add realtime stats endpoints for masters
Introduce StatsService integration to admin and master handlers,
exposing realtime metrics (requests, tokens, QPS, rate limit status)
via new endpoints:
- GET /admin/masters/:id/realtime
- GET /v1/realtime

Also embed realtime stats in the existing GET /admin/masters/:id
response and change GlobalQPS default to 0 with validation to
reject negative values.
2025-12-22 12:02:27 +08:00
zenfun
c2ed2f3f9e feat(api): add namespaces, batch ops, and admin logs 2025-12-21 23:16:27 +08:00
zenfun
73147fc55a feat(api): add model delete, pagination, and cors config 2025-12-21 23:03:12 +08:00
zenfun
816ea93339 feat(arch): add log partitioning and provider delete sync 2025-12-21 20:45:16 +08:00
zenfun
f7baa0f08f feat(ops): add status test and model fetch 2025-12-21 20:22:56 +08:00
zenfun
c2c65e774b feat(log): wire log db, metrics, and body toggle 2025-12-21 16:18:22 +08:00
zenfun
4c1e03f83d feat(api): add log webhook notification service
Implement webhook notifications for log error threshold alerts with
configurable thresholds, time windows, and cooldown periods.

- Add LogWebhookService with Redis-backed configuration storage
- Add admin endpoints for webhook config management (GET/PUT)
- Trigger webhook notifications when error count exceeds threshold
- Support status code threshold and error message detection
- Include sample log record data in webhook payload
2025-12-21 14:13:35 +08:00
zenfun
00192f937e feat(api): add log_request_body_enabled feature flag support
Add runtime feature flag to control whether request bodies are stored
in logs. The Handler now accepts a Redis client to check the
log_request_body_enabled feature flag before persisting log records.

- Add logRequestBodyFeatureKey constant for feature flag
- Inject Redis client into Handler for feature flag lookups
- Strip request body from log records when feature is disabled
- Update tests to pass Redis client to NewHandler
2025-12-21 13:26:16 +08:00
zenfun
25795a79d6 feat(cron): add automatic log cleanup with retention policy
Implement LogCleaner cron job to automatically clean up old log records
based on configurable retention period and maximum record count.

- Add LogCleaner with retention_days and max_records configuration
- Add EZ_LOG_RETENTION_DAYS and EZ_LOG_MAX_RECORDS environment variables
- Default to 30 days retention and 1,000,000 max records
- Include unit tests for log cleaner functionality
2025-12-21 12:01:52 +08:00
zenfun
369c204c16 feat: add admin log deletion 2025-12-21 00:53:52 +08:00
zenfun
88289015fc feat: add internal stats flush API 2025-12-19 23:26:33 +08:00
zenfun
ac9f0cd0a7 feat(stats): add usage stats and quota reset 2025-12-19 21:50:28 +08:00
zenfun
7dd3fac24e model-registry: add upstream check endpoint 2025-12-18 16:43:12 +08:00
zenfun
a61eff27e7 feat(admin/master): provider+master CRUD, token mgmt, logs APIs 2025-12-18 16:21:46 +08:00
zenfun
b2d2df18c5 feat(model-registry): models.dev updater + admin endpoints 2025-12-17 23:59:34 +08:00
zenfun
2b0ed3d3d5 feat(api): align provider creation with presets/custom/google sdk 2025-12-17 22:19:17 +08:00
zenfun
174735f152 feat(api): add provider creation endpoints and weight support
- Add `POST /admin/providers/preset` for streamlined creation of official providers (OpenAI, Anthropic, Gemini)
- Add `POST /admin/providers/custom` for generic OpenAI-compatible providers
- Add `weight` field to provider model and DTOs to enable weighted routing
- Update sync service to propagate provider weights
- Add unit tests for new creation handlers
2025-12-17 21:40:54 +08:00