- Fix JSON field alignment in error response and TrafficBucket struct
- Add documentation comment and swagger annotations for Breakdown field
- Remove unnecessary string concatenation spacing in SQL select
Write 'meta:providers_meta' to Redis during provider synchronization,
including version, timestamp, and configuration checksum. This aligns
provider sync with model metadata handling and enables better cache
invalidation.
Add comprehensive API documentation for the whoami endpoint including:
- Detailed response structure for each token type (Admin, Master, Child Key)
- Complete field listings for master keys and child keys
- All possible error responses with status codes
- Usage guidance for frontend initialization
Add realtime statistics (requests, tokens, QPS, rate limiting) to whoami
response for both master and key authentication types. Extend key response
with additional fields including master name, model limits, quota tracking,
and usage statistics.
- Inject StatsService into AuthHandler for realtime stats retrieval
- Add WhoamiRealtimeView struct for realtime statistics
- Include admin permissions field in admin response
- Add comprehensive key metadata (quotas, model limits, usage stats)
- Add test for expired key returning 401 Unauthorized
Removes the legacy route table maintenance logic from the sync service
that populated deprecated Redis keys. Additionally, deletes the unused
TokenService and KeyDTO files to reduce technical debt.
Add integration tests for `IPBanService.Update` to verify:
- Reactivating an expired ban correctly detects overlaps with existing active bans.
- Explicitly clearing the `expires_at` field (setting to null) works as expected.
- Initialize and schedule IP ban maintenance tasks in server entry point
- Perform initial IP ban sync to Redis on startup
- Implement optional JSON unmarshalling to handle null `expires_at` in API
- Add CIDR overlap validation when updating rule status to active
Add IPBanManager to handle periodic background jobs including:
- Expiring outdated bans
- Syncing hit counts from Redis to DB
- Performing full Redis state synchronization
Additionally, update the service expiration logic to use system time
and add unit tests for CIDR normalization and overlap checking.
Implement the HTTP handler for managing global IP/CIDR bans. This
includes endpoints for creating, listing, retrieving, updating, and
deleting IP ban rules, complete with Swagger documentation and error
handling.
Add IPBanService to manage global IP bans with Redis synchronization
for high-performance filtering. Includes logic for CIDR normalization,
overlap detection, hit count tracking, and rule expiration.
Introduces the IPBan model to support global IP/CIDR ban rules enforced by the data plane. Includes fields for CIDR, status, expiration, and hit counts, and registers the model for auto-migration in the server startup.
Update comments for EZ_BALANCER_TRUSTED_PROXIES to include:
- Header resolution priority (CF-Connecting-IP, Ali-CDN-Real-IP, etc.)
- Current Cloudflare IPv4/IPv6 CIDR lists for easier reference
- Specific notes on production configuration guidelines
Organize admin panel feature documentation into a dedicated directory
and include an interactive HTML mockup along with a reference
screenshot for the EZ-API Control Plane Dashboard.
Add detailed comments explaining production vs development configuration
recommendations, particularly regarding security and retention policies.
Update default values for balancer log sink and stats flush to enabled.
Add explanatory comments for EZ_BALANCER_LOG_SINK_ENABLED and
EZ_BALANCER_STATS_FLUSH_ENABLED to clarify default behavior and
production recommendations regarding monitoring and quota management.
Reorganize configuration variables into logical sections with clear
headers and detailed comments to enhance readability. Add missing log
buffering settings (EZ_LOG_QUEUE, EZ_LOG_BATCH_SIZE, EZ_LOG_FLUSH_MS)
and expand descriptions for authentication and network options.
Update .env.example with new configuration options:
- EZ_INTERNAL_ALLOW_ANON for controlling anonymous internal access
- EZ_BALANCER_ENABLE_TEST_KEYS for testing auth bypass
- EZ_BALANCER_TRUSTED_PROXIES for real IP resolution
Add security configuration section to README explaining internal endpoint
authentication logic and default behaviors.
Safeguard integer parsing in the `Whoami` handler by trimming whitespace and handling errors explicitly for `master_id`, `issued_at_epoch`, and `expires_at`. This prevents potential validation bypasses or incorrect behavior due to malformed metadata.
Add unit tests to verify invalid epoch handling and response correctness.
Add comprehensive test coverage for InternalAuthMiddleware including scenarios
for allowed anonymous access, missing tokens, invalid tokens, and empty token
configuration to ensure access control logic correctness.
Refactor the `Whoami` handler to validate token metadata (status, expiration,
revocation) against Redis before database lookup, ensuring consistency with
balancer logic. Add `allow_ips`, `deny_ips`, and `expires_at` fields to
authentication responses.
Update internal middleware to support explicit anonymous access configuration
and harden security for unconfigured tokens.
Remove legacy fallback logic for master keys without digests.
BREAKING CHANGE: Internal endpoints now reject requests by default if no stats token is configured. To allow unauthenticated access, set `internal.allow_anonymous` to true.
BREAKING CHANGE: Support for legacy master keys without stored digests has been removed.
- Add detailed interaction tables for system status, metrics, and charts
- Update API field mappings to match backend implementation
- Clarify error handling, loading states, and edge cases
Regenerate API documentation to reflect recent statistics features:
- Add definition for new `/admin/logs/stats/traffic-chart` endpoint
- Update dashboard summary with `include_trends` parameter and new time periods
- Add `DashboardTrends` and `TrafficChartResponse` data structures
- Update alert types to include `traffic_spike
Introduce `CalculateTrendFloatWithBaseline` to correctly handle scenarios where previous period metrics (Error Rate, Latency) are zero or missing. This prevents arithmetic errors and distinguishes between "new" data and actual increases ("up") when starting from zero.
Also updates the admin panel dashboard documentation to reflect current project status.
Remove the DailyStatsJob, DailyStat model, and associated database
migrations. This eliminates the pre-aggregation layer and updates the
dashboard handler to remove dependencies on the daily_stats table.
- Add `include_trends` query parameter to enable trend calculation
- Implement trend comparison logic (delta % and direction) against previous periods
- Add support for `last7d`, `last30d`, and `all` time period options
- Update `DashboardSummaryResponse` to include optional `trends` field
- Add helper functions for custom time window aggregation
- Add unit tests for trend calculation and period window logic
- Update feature documentation with new parameters and response schemas
- Create `DailyStat` model for immutable daily metrics including
request counts, tokens, latency, and top models.
- Implement `DailyStatsJob` to aggregate `log_records` from the previous
day, running daily at 00:05 UTC.
- Register database migrations and schedule the job in the server.
- Add `last7d` and `last30d` period support to stats handler.
Update dashboard summary specification to distinguish between provider
keys (upstream) and internal keys. Change summary metrics to use
`provider_keys` fields for better clarity.
Add section on known limitations regarding time period logic and
missing trend data.
Update the UI specification to distinguish warning severity with orange
color instead of grouping it with critical (red). Also remove redundant
project overview table.
Restructure documentation into tables for improved readability and
include detailed specifications for the new traffic chart endpoint.
Define explicit data refresh strategies, error handling guidelines, and
response structures for admin panel components.
Add detailed specification for the EZ-API Control Plane Dashboard
frontend-backend integration. This document defines the data mapping,
API endpoints, and UI logic for global navigation, real-time metrics,
traffic analysis, and alert summaries.
Add TestTrafficChart_MinuteGranularityValidation to verify input
parameters including granularity, time range limits, and top_n
constraints. Include skipped placeholders for PostgreSQL-specific
aggregation tests.
Add new endpoint GET /admin/logs/stats/traffic-chart to provide
aggregated traffic metrics grouped by time and model. Features include:
- Time granularity selection (hour/minute)
- Top-N model breakdown with "other" aggregation
- Metrics for request counts and token usage
Includes generated Swagger documentation.
- Create application-level context with cancel function
- Pass base context to scheduler via WithBaseContext option
- Move scheduler.Stop() to explicit shutdown sequence after context cancellation
- Upgrade foundation dependency to v0.7.0 for new scheduler options
Replace internal goroutine-based timing loops with scheduler integration
for SyncOutboxService and ModelRegistryService. Both services now expose
RunOnce() methods called by the central scheduler instead of managing
their own background loops.
- Add Interval() and RunOnce() methods to SyncOutboxService
- Add RefreshEvery() and RunOnce() methods to ModelRegistryService
- Remove started flag from SyncOutboxService struct
- Move scheduler.Start() after all services are initialized
- Ensure initial model registry refresh before scheduler starts
Replace custom goroutine-based scheduling in cron jobs with centralized
foundation scheduler. Each cron job now exposes a RunOnce method called
by the scheduler instead of managing its own ticker loop.
Changes:
- Remove interval/enabled config from cron job structs
- Convert Start() methods to RunOnce() for all cron jobs
- Add scheduler setup in main.go with configurable intervals
- Update foundation dependency to v0.6.0 for scheduler support
- Update tests to validate RunOnce nil-safety
Add auto-generated Swagger/OpenAPI documentation files for the
EZ-API Control Plane. These files are generated by swaggo/swag
and provide comprehensive API documentation including:
- All admin endpoints (alerts, API keys, bindings, masters, etc.)
- Master/tenant endpoints for token management and stats
- Internal endpoints for DP-CP communication
- Authentication schemes (AdminAuth, MasterAuth)
- Request/response schema definitions
- Add MasterID field with index to LogRecord model for efficient queries
- Fix threshold config loading to use fixed ID=1 with FirstOrCreate
- Allow traffic spike detection to work without Redis for log-based checks
- Add traffic_spike to API documentation for alert type filter
- Add comprehensive tests for RPM/RPD/TPM spike detection scenarios
Add unit tests for alert-related functionality:
- alert_handler_test.go: tests for threshold CRUD operations,
alert creation with traffic_spike type, filtering, and stats
- alert_detector_test.go: tests for threshold config loading,
traffic spike severity calculation, deduplication logic,
error rate severity, and nil-safety checks
Also fix format string issues:
- Use %d instead of %.2f for integer QPS in alert messages
- Wrap error description with format directive to avoid linter warning
Add OpenAPI documentation for the new alert threshold management
endpoints used in traffic spike detection:
- GET /admin/alerts/thresholds - retrieve current threshold config
- PUT /admin/alerts/thresholds - update threshold configuration
Include AlertThresholdView and UpdateAlertThresholdsRequest schema
definitions with properties for QPS, RPM, TPM, RPD, TPD limits.
Introduce traffic_spike alert type for monitoring system and per-master
traffic levels with configurable thresholds stored in database.
- Add AlertThresholdConfig model for persistent threshold configuration
- Implement GET/PUT /admin/alerts/thresholds endpoints for threshold management
- Add traffic spike detection in alert detector cron job:
- Global QPS monitoring across all masters
- Per-master RPM/TPM checks with minimum sample thresholds
- Per-master RPD/TPD checks for daily limits
- Use warning severity at threshold, critical at 2x threshold
- Include metric metadata (value, threshold, window) in alert details
- Update API documentation with new endpoints and alert type
Implement AlertDetector background task that runs every minute to detect
and create alerts for various anomalies:
- Rate limit detection: monitors masters hitting rate limits
- Error spike detection: flags keys with >= 10% error rate
- Quota exceeded: warns when key quota usage >= 90%
- Provider down: alerts when API keys have >= 50% failure rate
Includes fingerprint-based deduplication with 5-minute cooldown to
prevent duplicate alerts for the same issue.
- Add dashboard alerts, realtime, and apikey-stats endpoints
- Document time range parameters for logs and apikey-stats
- Update daily operations workflow with new monitoring endpoints
- Clarify period parameter behavior (default returns all data)
Document two new API endpoints:
- GET /admin/apikey-stats/summary with optional since/until params
for querying statistics within a specific time range
- POST /internal/alerts/report for Data Plane to report alerts
with fingerprint-based deduplication mechanism
Add ReportAlerts endpoint for Data Plane to report alerts to Control Plane
with fingerprint-based deduplication using a 5-minute cooldown period.
Changes:
- Add POST /internal/alerts/report endpoint with validation
- Add Fingerprint field to Alert model for deduplication
- Extend GetAPIKeyStatsSummary with optional time range filtering
using since/until query parameters to query from log records
Add comprehensive API documentation for new admin endpoints:
- Dashboard summary endpoint with period/time range parameters
- System-level realtime statistics (QPS, RPM, rate limits)
- Log stats aggregation by hour/minute with time constraints
- API key status filtering (active/suspended/disabled)
- Complete alert management system documentation:
- Alert types, severity levels, and status definitions
- CRUD endpoints for alert lifecycle management
- Alert statistics endpoint
Add OpenAPI documentation for the new alerts management system:
- CRUD endpoints for system alerts (/admin/alerts)
- Alert acknowledgment and resolution endpoints
- Alert statistics endpoint
- Alert filtering by status, severity, and type
Also document minute-level aggregation support for log stats
with 6-hour time range limitation.