Add integration tests for `IPBanService.Update` to verify:
- Reactivating an expired ban correctly detects overlaps with existing active bans.
- Explicitly clearing the `expires_at` field (setting to null) works as expected.
- Initialize and schedule IP ban maintenance tasks in server entry point
- Perform initial IP ban sync to Redis on startup
- Implement optional JSON unmarshalling to handle null `expires_at` in API
- Add CIDR overlap validation when updating rule status to active
Add IPBanManager to handle periodic background jobs including:
- Expiring outdated bans
- Syncing hit counts from Redis to DB
- Performing full Redis state synchronization
Additionally, update the service expiration logic to use system time
and add unit tests for CIDR normalization and overlap checking.
Implement the HTTP handler for managing global IP/CIDR bans. This
includes endpoints for creating, listing, retrieving, updating, and
deleting IP ban rules, complete with Swagger documentation and error
handling.
Add IPBanService to manage global IP bans with Redis synchronization
for high-performance filtering. Includes logic for CIDR normalization,
overlap detection, hit count tracking, and rule expiration.
Introduces the IPBan model to support global IP/CIDR ban rules enforced by the data plane. Includes fields for CIDR, status, expiration, and hit counts, and registers the model for auto-migration in the server startup.
Safeguard integer parsing in the `Whoami` handler by trimming whitespace and handling errors explicitly for `master_id`, `issued_at_epoch`, and `expires_at`. This prevents potential validation bypasses or incorrect behavior due to malformed metadata.
Add unit tests to verify invalid epoch handling and response correctness.
Add comprehensive test coverage for InternalAuthMiddleware including scenarios
for allowed anonymous access, missing tokens, invalid tokens, and empty token
configuration to ensure access control logic correctness.
Refactor the `Whoami` handler to validate token metadata (status, expiration,
revocation) against Redis before database lookup, ensuring consistency with
balancer logic. Add `allow_ips`, `deny_ips`, and `expires_at` fields to
authentication responses.
Update internal middleware to support explicit anonymous access configuration
and harden security for unconfigured tokens.
Remove legacy fallback logic for master keys without digests.
BREAKING CHANGE: Internal endpoints now reject requests by default if no stats token is configured. To allow unauthenticated access, set `internal.allow_anonymous` to true.
BREAKING CHANGE: Support for legacy master keys without stored digests has been removed.
Introduce `CalculateTrendFloatWithBaseline` to correctly handle scenarios where previous period metrics (Error Rate, Latency) are zero or missing. This prevents arithmetic errors and distinguishes between "new" data and actual increases ("up") when starting from zero.
Also updates the admin panel dashboard documentation to reflect current project status.
Remove the DailyStatsJob, DailyStat model, and associated database
migrations. This eliminates the pre-aggregation layer and updates the
dashboard handler to remove dependencies on the daily_stats table.
- Add `include_trends` query parameter to enable trend calculation
- Implement trend comparison logic (delta % and direction) against previous periods
- Add support for `last7d`, `last30d`, and `all` time period options
- Update `DashboardSummaryResponse` to include optional `trends` field
- Add helper functions for custom time window aggregation
- Add unit tests for trend calculation and period window logic
- Update feature documentation with new parameters and response schemas
- Create `DailyStat` model for immutable daily metrics including
request counts, tokens, latency, and top models.
- Implement `DailyStatsJob` to aggregate `log_records` from the previous
day, running daily at 00:05 UTC.
- Register database migrations and schedule the job in the server.
- Add `last7d` and `last30d` period support to stats handler.
Add TestTrafficChart_MinuteGranularityValidation to verify input
parameters including granularity, time range limits, and top_n
constraints. Include skipped placeholders for PostgreSQL-specific
aggregation tests.
Add new endpoint GET /admin/logs/stats/traffic-chart to provide
aggregated traffic metrics grouped by time and model. Features include:
- Time granularity selection (hour/minute)
- Top-N model breakdown with "other" aggregation
- Metrics for request counts and token usage
Includes generated Swagger documentation.
Replace internal goroutine-based timing loops with scheduler integration
for SyncOutboxService and ModelRegistryService. Both services now expose
RunOnce() methods called by the central scheduler instead of managing
their own background loops.
- Add Interval() and RunOnce() methods to SyncOutboxService
- Add RefreshEvery() and RunOnce() methods to ModelRegistryService
- Remove started flag from SyncOutboxService struct
- Move scheduler.Start() after all services are initialized
- Ensure initial model registry refresh before scheduler starts
Replace custom goroutine-based scheduling in cron jobs with centralized
foundation scheduler. Each cron job now exposes a RunOnce method called
by the scheduler instead of managing its own ticker loop.
Changes:
- Remove interval/enabled config from cron job structs
- Convert Start() methods to RunOnce() for all cron jobs
- Add scheduler setup in main.go with configurable intervals
- Update foundation dependency to v0.6.0 for scheduler support
- Update tests to validate RunOnce nil-safety
- Add MasterID field with index to LogRecord model for efficient queries
- Fix threshold config loading to use fixed ID=1 with FirstOrCreate
- Allow traffic spike detection to work without Redis for log-based checks
- Add traffic_spike to API documentation for alert type filter
- Add comprehensive tests for RPM/RPD/TPM spike detection scenarios
Add unit tests for alert-related functionality:
- alert_handler_test.go: tests for threshold CRUD operations,
alert creation with traffic_spike type, filtering, and stats
- alert_detector_test.go: tests for threshold config loading,
traffic spike severity calculation, deduplication logic,
error rate severity, and nil-safety checks
Also fix format string issues:
- Use %d instead of %.2f for integer QPS in alert messages
- Wrap error description with format directive to avoid linter warning
Introduce traffic_spike alert type for monitoring system and per-master
traffic levels with configurable thresholds stored in database.
- Add AlertThresholdConfig model for persistent threshold configuration
- Implement GET/PUT /admin/alerts/thresholds endpoints for threshold management
- Add traffic spike detection in alert detector cron job:
- Global QPS monitoring across all masters
- Per-master RPM/TPM checks with minimum sample thresholds
- Per-master RPD/TPD checks for daily limits
- Use warning severity at threshold, critical at 2x threshold
- Include metric metadata (value, threshold, window) in alert details
- Update API documentation with new endpoints and alert type
Implement AlertDetector background task that runs every minute to detect
and create alerts for various anomalies:
- Rate limit detection: monitors masters hitting rate limits
- Error spike detection: flags keys with >= 10% error rate
- Quota exceeded: warns when key quota usage >= 90%
- Provider down: alerts when API keys have >= 50% failure rate
Includes fingerprint-based deduplication with 5-minute cooldown to
prevent duplicate alerts for the same issue.
Add ReportAlerts endpoint for Data Plane to report alerts to Control Plane
with fingerprint-based deduplication using a 5-minute cooldown period.
Changes:
- Add POST /internal/alerts/report endpoint with validation
- Add Fingerprint field to Alert model for deduplication
- Extend GetAPIKeyStatsSummary with optional time range filtering
using since/until query parameters to query from log records
Introduce a comprehensive alert management system for monitoring
system events and notifications.
Changes include:
- Add Alert model with type, severity, status, and metadata fields
- Implement AlertHandler with full CRUD operations (create, list,
get, acknowledge, resolve, dismiss)
- Add alert statistics endpoint for counts by status and severity
- Register Alert model in database auto-migration
- Add minute-level aggregation to log stats (limited to 6-hour range)
Add new admin API endpoints for dashboard metrics and system-wide
realtime statistics:
- Add /admin/dashboard/summary endpoint with aggregated metrics
including requests, tokens, latency, masters, keys, and provider
keys statistics with time period filtering
- Add /admin/realtime endpoint for system-level realtime stats
aggregated across all masters
- Add status filter parameter to ListAPIKeys endpoint
- Add hour grouping option to log stats aggregation
- Update OpenAPI documentation with new endpoints and schemas
Introduce internal endpoint for flushing accumulated APIKey statistics
from data plane to control plane database, updating both individual
API keys and their parent provider groups with request counts and
success/failure rates.
Add admin endpoint to retrieve aggregated API key statistics summary
across all provider groups, including total requests, success/failure
counts, and calculated rates.
Implement automatic token refresh mechanism for CPA providers (Codex,
GeminiCLI, Antigravity, ClaudeCode) with the following features:
- Periodic refresh of expiring tokens based on configurable interval
- Redis event queue processing for on-demand token refresh
- Retry logic with exponential backoff for transient failures
- Automatic key deactivation on non-retryable errors
- Provider-specific OAuth token refresh implementations
- Sync service integration to update providers after refresh
Add support for OAuth-based authentication with access/refresh tokens
and expiration tracking for API keys. Extend provider groups with
static headers configuration and headers profile options.
Changes include:
- Add AccessToken, RefreshToken, ExpiresAt, AccountID, ProjectID to APIKey model
- Add StaticHeaders and HeadersProfile to ProviderGroup model
- Add TokenRefresh configuration for background token management
- Support new provider types: ClaudeCode, Codex, GeminiCLI, Antigravity
- Update sync service to include new fields in provider snapshots
Replace health_handler with status_handler providing public /status and
/about endpoints. Add build-time version injection via ldflags in
Makefile, and support --version/-v CLI flag.
- Add /status endpoint returning runtime status, uptime, and version
- Add /about endpoint with system metadata (name, description, repo)
- Configure VERSION variable with git describe fallback
- Update swagger docs and api.md for new public endpoints
- Remove deprecated /api/status/test endpoint
Use UnixNano for version field while keeping Unix seconds for updated_at
timestamp. This ensures version changes are detected even when multiple
syncs occur within the same second.
Add new response types and parameters for logs API:
- Add GroupedStatsItem and GroupedStatsResponse definitions
- Add ListLogsResponse and LogView definitions for detailed log records
- Add group_by enum parameter (model/day/month) to stats endpoint
- Update endpoint descriptions to clarify response types and request_body inclusion
- Update response schema references to use correct types
Add group_by parameter to /admin/logs/stats endpoint supporting:
- group_by=model: aggregate stats per model with avg latency
- group_by=day: daily aggregation with token counts
- group_by=month: monthly aggregation with token counts
Also include request_body field in admin ListLogs response for
full visibility into logged requests.
- Introduce `SyncOutboxService` and model to retry failed CP-to-Redis sync operations
- Update `SyncService` to handle sync failures by enqueuing tasks to the outbox
- Centralize provider group and API key validation logic into `ProviderGroupManager`
- Refactor API handlers to utilize the new manager and robust sync methods
- Add configuration options for sync outbox (interval, batch size, retries)
- Update TestBatchBindings_Status to use newTestHandlerWithRedis helper
- Remove unused jsonID helper function from sync_bindings_spec_test.go
- Add time import to models.go
Restructure the provider management system by separating the monolithic
Provider model into two distinct entities:
- ProviderGroup: defines shared upstream configuration (type, base_url,
google settings, models, status)
- APIKey: represents individual credentials within a group (api_key,
weight, status, auto_ban, ban settings)
This change also updates:
- Binding model to reference GroupID instead of RouteGroup string
- All CRUD handlers for the new provider-group and api-key endpoints
- Sync service to rebuild provider snapshots from joined tables
- Model registry to aggregate capabilities across group/key pairs
- Access handler to validate namespace existence and subset constraints
- Migration importer to handle the new schema structure
- All related tests to use the new model relationships
BREAKING CHANGE: Provider API endpoints replaced with /provider-groups
and /api-keys endpoints; Binding.RouteGroup replaced with Binding.GroupID
Introduce a new `import` subcommand to the server binary that reads
exported JSON files and imports masters, providers, keys, bindings,
and namespaces into the database.
Key features:
- Support for dry-run mode to validate without writing
- Conflict policies: skip existing or overwrite
- Optional binding import via --include-bindings flag
- Auto-generation of master keys with secure hashing
- Namespace auto-creation for referenced namespaces
- Detailed import summary with warnings and created credentials
Introduce StatsService integration to admin and master handlers,
exposing realtime metrics (requests, tokens, QPS, rate limit status)
via new endpoints:
- GET /admin/masters/:id/realtime
- GET /v1/realtime
Also embed realtime stats in the existing GET /admin/masters/:id
response and change GlobalQPS default to 0 with validation to
reject negative values.
Implement webhook notifications for log error threshold alerts with
configurable thresholds, time windows, and cooldown periods.
- Add LogWebhookService with Redis-backed configuration storage
- Add admin endpoints for webhook config management (GET/PUT)
- Trigger webhook notifications when error count exceeds threshold
- Support status code threshold and error message detection
- Include sample log record data in webhook payload