Replace custom goroutine-based scheduling in cron jobs with centralized
foundation scheduler. Each cron job now exposes a RunOnce method called
by the scheduler instead of managing its own ticker loop.
Changes:
- Remove interval/enabled config from cron job structs
- Convert Start() methods to RunOnce() for all cron jobs
- Add scheduler setup in main.go with configurable intervals
- Update foundation dependency to v0.6.0 for scheduler support
- Update tests to validate RunOnce nil-safety
Introduce traffic_spike alert type for monitoring system and per-master
traffic levels with configurable thresholds stored in database.
- Add AlertThresholdConfig model for persistent threshold configuration
- Implement GET/PUT /admin/alerts/thresholds endpoints for threshold management
- Add traffic spike detection in alert detector cron job:
- Global QPS monitoring across all masters
- Per-master RPM/TPM checks with minimum sample thresholds
- Per-master RPD/TPD checks for daily limits
- Use warning severity at threshold, critical at 2x threshold
- Include metric metadata (value, threshold, window) in alert details
- Update API documentation with new endpoints and alert type
Implement AlertDetector background task that runs every minute to detect
and create alerts for various anomalies:
- Rate limit detection: monitors masters hitting rate limits
- Error spike detection: flags keys with >= 10% error rate
- Quota exceeded: warns when key quota usage >= 90%
- Provider down: alerts when API keys have >= 50% failure rate
Includes fingerprint-based deduplication with 5-minute cooldown to
prevent duplicate alerts for the same issue.
Add ReportAlerts endpoint for Data Plane to report alerts to Control Plane
with fingerprint-based deduplication using a 5-minute cooldown period.
Changes:
- Add POST /internal/alerts/report endpoint with validation
- Add Fingerprint field to Alert model for deduplication
- Extend GetAPIKeyStatsSummary with optional time range filtering
using since/until query parameters to query from log records
Introduce a comprehensive alert management system for monitoring
system events and notifications.
Changes include:
- Add Alert model with type, severity, status, and metadata fields
- Implement AlertHandler with full CRUD operations (create, list,
get, acknowledge, resolve, dismiss)
- Add alert statistics endpoint for counts by status and severity
- Register Alert model in database auto-migration
- Add minute-level aggregation to log stats (limited to 6-hour range)
Add new admin API endpoints for dashboard metrics and system-wide
realtime statistics:
- Add /admin/dashboard/summary endpoint with aggregated metrics
including requests, tokens, latency, masters, keys, and provider
keys statistics with time period filtering
- Add /admin/realtime endpoint for system-level realtime stats
aggregated across all masters
- Add status filter parameter to ListAPIKeys endpoint
- Add hour grouping option to log stats aggregation
- Update OpenAPI documentation with new endpoints and schemas
Introduce internal endpoint for flushing accumulated APIKey statistics
from data plane to control plane database, updating both individual
API keys and their parent provider groups with request counts and
success/failure rates.
Add admin endpoint to retrieve aggregated API key statistics summary
across all provider groups, including total requests, success/failure
counts, and calculated rates.
Implement automatic token refresh mechanism for CPA providers (Codex,
GeminiCLI, Antigravity, ClaudeCode) with the following features:
- Periodic refresh of expiring tokens based on configurable interval
- Redis event queue processing for on-demand token refresh
- Retry logic with exponential backoff for transient failures
- Automatic key deactivation on non-retryable errors
- Provider-specific OAuth token refresh implementations
- Sync service integration to update providers after refresh
Replace health_handler with status_handler providing public /status and
/about endpoints. Add build-time version injection via ldflags in
Makefile, and support --version/-v CLI flag.
- Add /status endpoint returning runtime status, uptime, and version
- Add /about endpoint with system metadata (name, description, repo)
- Configure VERSION variable with git describe fallback
- Update swagger docs and api.md for new public endpoints
- Remove deprecated /api/status/test endpoint
- Introduce `SyncOutboxService` and model to retry failed CP-to-Redis sync operations
- Update `SyncService` to handle sync failures by enqueuing tasks to the outbox
- Centralize provider group and API key validation logic into `ProviderGroupManager`
- Refactor API handlers to utilize the new manager and robust sync methods
- Add configuration options for sync outbox (interval, batch size, retries)
Restructure the provider management system by separating the monolithic
Provider model into two distinct entities:
- ProviderGroup: defines shared upstream configuration (type, base_url,
google settings, models, status)
- APIKey: represents individual credentials within a group (api_key,
weight, status, auto_ban, ban settings)
This change also updates:
- Binding model to reference GroupID instead of RouteGroup string
- All CRUD handlers for the new provider-group and api-key endpoints
- Sync service to rebuild provider snapshots from joined tables
- Model registry to aggregate capabilities across group/key pairs
- Access handler to validate namespace existence and subset constraints
- Migration importer to handle the new schema structure
- All related tests to use the new model relationships
BREAKING CHANGE: Provider API endpoints replaced with /provider-groups
and /api-keys endpoints; Binding.RouteGroup replaced with Binding.GroupID
Introduce a new `import` subcommand to the server binary that reads
exported JSON files and imports masters, providers, keys, bindings,
and namespaces into the database.
Key features:
- Support for dry-run mode to validate without writing
- Conflict policies: skip existing or overwrite
- Optional binding import via --include-bindings flag
- Auto-generation of master keys with secure hashing
- Namespace auto-creation for referenced namespaces
- Detailed import summary with warnings and created credentials
Introduce StatsService integration to admin and master handlers,
exposing realtime metrics (requests, tokens, QPS, rate limit status)
via new endpoints:
- GET /admin/masters/:id/realtime
- GET /v1/realtime
Also embed realtime stats in the existing GET /admin/masters/:id
response and change GlobalQPS default to 0 with validation to
reject negative values.
Implement webhook notifications for log error threshold alerts with
configurable thresholds, time windows, and cooldown periods.
- Add LogWebhookService with Redis-backed configuration storage
- Add admin endpoints for webhook config management (GET/PUT)
- Trigger webhook notifications when error count exceeds threshold
- Support status code threshold and error message detection
- Include sample log record data in webhook payload
Add runtime feature flag to control whether request bodies are stored
in logs. The Handler now accepts a Redis client to check the
log_request_body_enabled feature flag before persisting log records.
- Add logRequestBodyFeatureKey constant for feature flag
- Inject Redis client into Handler for feature flag lookups
- Strip request body from log records when feature is disabled
- Update tests to pass Redis client to NewHandler
Implement LogCleaner cron job to automatically clean up old log records
based on configurable retention period and maximum record count.
- Add LogCleaner with retention_days and max_records configuration
- Add EZ_LOG_RETENTION_DAYS and EZ_LOG_MAX_RECORDS environment variables
- Default to 30 days retention and 1,000,000 max records
- Include unit tests for log cleaner functionality
- Add `POST /admin/providers/preset` for streamlined creation of official providers (OpenAI, Anthropic, Gemini)
- Add `POST /admin/providers/custom` for generic OpenAI-compatible providers
- Add `weight` field to provider model and DTOs to enable weighted routing
- Update sync service to propagate provider weights
- Add unit tests for new creation handlers
Add endpoints for master and key access management to configure default
and allowed namespaces, including propagation options.
Implement GET and DELETE operations for individual bindings.
Update sync service to persist bindings snapshots even when no upstreams
are available.
Implement handlers for creating, listing, and updating model bindings.
Register new routes in the admin server group and add DTO definitions.
Update provider handlers to trigger binding synchronization on changes
to ensure upstream mappings remain current.
Introduce namespace-aware routing capabilities through a new Binding
model and updates to Master/Key entities.
- Add Binding model for configuring model routes per namespace
- Add DefaultNamespace and Namespaces fields to Master and Key models
- Update auto-migration to include Binding table
- Implement Redis synchronization for binding configurations
- Propagate namespace settings during master creation and key issuance
Add `POST /admin/masters/{id}/keys` allowing admins to issue child keys
on behalf of a master. Introduce an `issued_by` field in the Key model
to audit whether a key was issued by the master or an admin.
Refactor master service to use typed errors for consistent HTTP status
mapping and ensure validation logic (active status, group check) is
shared.
Add FeatureHandler to manage lightweight runtime configuration toggles
stored in the Redis `meta:features` hash. This enables dynamic control
over system behavior (e.g., storage backends) via the admin API.
- Add `GET /admin/features` to list flags
- Add `PUT /admin/features` to update flags
- Update README with feature flag documentation
Introduce a middleware layer to attach a unique identifier to each HTTP request for observability purposes. The identifier is propagated via the X-Request-ID header, allowing for correlation of logs and events across distributed system components.
Add request ID middleware to trace requests through the system. The middleware checks for existing X-Request-ID headers, generates a new UUID if not present, and sets the ID in both request/response headers and Gin context.
- Replace direct zerolog usage with standard library `log/slog` in business code
- Add `internal/logging` package with zerolog bridge handler for structured output
- Create `internal/jsoncodec` package to centralize JSON encoding/decoding using Sonic
- Update all services and main entry point to use new logging interface
- Maintain backward compatibility with existing zerolog console output
- Remove custom logger setup in favor of structured logging bridge
- Integrate `rs/zerolog` for structured and leveled logging
- Add `EZ_LOG_LEVEL` environment variable support (default: info)
- Configure console output with timestamps and service fields
- Migrate `main.go` and `LogWriter` to use the new logger instance
- Update README with logging configuration and design details
- Added Swagger documentation for the following admin endpoints:
- Create a new master tenant
- Create a new provider
- Register a new model
- List all models
- Update a model
- Force sync snapshot
- Ingest logs
- Added Swagger documentation for the master endpoint:
- Issue a child key
- Updated go.mod and go.sum to include necessary dependencies for Swagger.
- Add token hash fields to Master and Key models for indexed lookups
- Implement SyncService integration in admin and master handlers
- Update master key validation with backward-compatible digest lookup
- Hash child keys in database and store token digests for Redis sync
- Add master metadata sync to Redis for balancer validation
- Ensure backward compatibility with legacy rows during migration
Add admin and master authentication layers with JWT support. Replace direct
key creation with hierarchical master/child key system. Update database
schema to support master accounts with configurable limits and epoch-based
key revocation. Add health check endpoint with system status monitoring.
BREAKING CHANGE: Removed direct POST /keys endpoint in favor of master-based
key issuance through /v1/tokens. Database migration requires dropping old User
table and creating Master table with new relationships.
Enable full resource management via API and support data plane
synchronization.
- Add CRUD handlers for Providers, Models, and Keys using DTOs
- Implement LogWriter service for asynchronous, batched audit logging
- Update SyncService to snapshot full configuration state to Redis
- Register new API routes and initialize background services
- Add configuration options for logging performance tuning
Establish the foundational structure for the ez-api server.
Key changes include:
- Set up main entry point with graceful shutdown and Gin router
- Configure database connections for PostgreSQL (GORM) and Redis
- Define core data models (User, Provider, Key, Model)
- Implement configuration loading and basic key creation handler
- Add Dockerfile for multi-stage builds and .gitignore