Installation
Wardis runs as a set of Docker containers. You need Docker and Docker Compose installed on your machine. The entire stack — gateway, dashboard, PostgreSQL, ClickHouse, and Redis — starts with one command.
The gateway starts on localhost:4000 and the dashboard on localhost:3000.
For semantic cache support (optional), start the full profile which includes the embedding service and Qdrant:
Requirements
First login & setup
When you open localhost:3000 for the first time, you'll see the setup wizard. Create your admin account — this becomes the organization owner.
Connect a provider
Provider API keys are managed exclusively through the dashboard — environment variables like OPENAI_API_KEY are intentionally ignored. This ensures every request is tracked, budgeted, and audited.
Go to Gateway → Providers in the sidebar and add your provider's API key. Wardis supports:
Any OpenAI-compatible provider also works — DeepSeek, Groq, Mistral, Together AI, Fireworks AI, Cerebras, and Z.ai (GLM).
Your first request
Wardis exposes an OpenAI-compatible API. Point your existing code at the gateway instead of the provider — no other changes needed.
Or in Python with the OpenAI SDK:
The request is proxied to OpenAI, and the response is returned in the same format. Wardis logs the token count, cost, and latency automatically. Check the dashboard to see it appear in real time.
Dashboard overview
The dashboard is organized into four sections, accessible from the left sidebar:
Analytics
Dashboard overview, usage logs, and agent task tracking. This is where you see costs, token counts, and request history.
Control
Budgets, alerts, and automation rules. Set spending limits per org, team, or key. Configure notifications and budget-triggered routing.
Gateway
Provider configuration, routing rules, and semantic cache settings. Manage which AI providers are available and how requests are routed.
Workspace
API keys, teams, users & RBAC, and audit log. Manage access, invite team members, and review all actions.
Usage & analytics
The usage page shows every request that passes through the gateway. You can filter by date range, model, provider, team, API key, and status. Each row shows the model used, token count (input/output), cost in USD, latency, and status.
Click any request to see the full details including request metadata, headers, and the associated agent task (if any). All event data is stored in ClickHouse for fast queries even over millions of rows.
API keys
API keys authenticate requests to the Wardis gateway. Each key belongs to a team and can have its own budget limit. Keys are hashed with bcrypt and never stored in plain text.
Go to Workspace → API Keys to create, rotate, or deactivate keys. Each key shows its usage metrics — requests, tokens, and cost — directly in the table.
Keys are prefixed with wd- to distinguish them from provider keys.
Teams
Teams group users and API keys together. Each team can have its own monthly budget. Costs from all keys belonging to a team are aggregated on the team level.
For SaaS builders, teams map naturally to customers — one team per customer, one key per customer, one budget per customer. See the SaaS use case for details.
Providers
Wardis proxies requests to AI providers using an OpenAI-compatible API. You configure provider API keys through the dashboard, not through environment variables. This ensures every request is fully tracked and audited.
Supported providers
Native adapters: OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Ollama. Each has a dedicated adapter that handles request/response format translation.
OpenAI-compatible: DeepSeek, Groq, Mistral, Together AI, Fireworks AI, Cerebras, Z.ai (GLM). Any provider that accepts the OpenAI chat completions format works out of the box.
Model pricing
Wardis ships with pricing data for over 2,100 models via the LiteLLM pricing database. Prices are updated regularly. You can view and search all models on the Gateway → Providers page in the dashboard.
Budgets
Budgets are enforced in real time on every request before it reaches the provider. The hierarchy is: organization → team → API key. A request is blocked if any level in the hierarchy has exceeded its limit.
Enforcement behavior
At 80% of budget — a warning alert is sent (email, Slack, or webhook).
At 95% — budget-aware routing can automatically switch to a cheaper model.
At 100% — the request is rejected with HTTP 429.
Budgets reset automatically at the start of each calendar month (UTC). Set budgets inline on the Control → Budgets page.
Alerts
Alert rules notify you when something needs attention. You can create rules for budget thresholds, cost spikes, error rates, and latency.
Alerts can be sent via email, Slack webhook, or generic webhook. Each rule has a configurable cooldown period to prevent alert fatigue. Manage rules on the Control → Alerts page.
Agent tracking
AI agents — especially multi-agent systems — can generate hundreds of LLM requests across orchestrators, subagents, and tool calls. Wardis correlates all of these into a single task with a detailed cost breakdown by agent and by level in the agent tree.
How it works
Agents inject tracking headers into their requests:
Wardis groups all requests sharing the same task ID and builds the agent tree automatically. The result is a single view showing the orchestrator, each subagent, and every tool call — with cost, token count, and duration at every level.
Agent types
View agent tasks on the Analytics → Agents page. Each task shows the full breakdown with a timeline of when each agent started and finished.
Routing
Routing rules control how requests are directed to providers. You can route based on model, cost, latency, or custom conditions. Fallback chains let you automatically retry with a different provider if the primary one fails.
Budget-aware routing
When a team or key approaches its budget limit, Wardis can automatically switch requests to a cheaper model. For example, routing from claude-opus-4.7 to haiku-4.5 when 95% of the budget is consumed. Configure this on the Control → Automation page.
Fallback chains
If a provider returns an error or is unreachable, routing rules can automatically fall back to an alternative provider. Configure routes on the Gateway → Routing page.
Semantic cache
The semantic cache avoids sending duplicate or near-duplicate requests to providers. It works on two levels: exact match (SHA-256 hash via Redis) and semantic similarity (embeddings via Qdrant). Cache is opt-in per API key.
Setup
Semantic cache requires the embedding service and Qdrant. Start them with the full Docker profile:
Then enable caching per API key on the Gateway → Cache page. You can configure the TTL and similarity threshold per key.
Cached responses include the header X-Wardis-Cache: hit so you can verify cache behavior. Cache stats — hit rate and estimated savings — are shown on the cache page.
API reference
The gateway exposes an OpenAI-compatible proxy API and a management API for programmatic control.
Proxy endpoints
Management API
All management endpoints require session authentication (cookie-based via the dashboard) or a valid API key.
MCP server
Wardis includes an MCP (Model Context Protocol) server that Claude Code and other agents can connect to for automatic cost tracking. When configured, agents can start tasks, check budgets, and receive cost reports without any manual header injection.
Available tools
Configuration
Add the Wardis MCP server to your .mcp.json:
Architecture
Wardis is a monorepo built with pnpm workspaces. The main components are:
Gateway — apps/gateway
Node.js / TypeScript / Fastify. Handles proxy, auth, token counting, cost calculation, budget enforcement, agent correlation, routing, and caching. Runs on port 4000.
Dashboard — apps/dashboard
Next.js 15 / React 19 / TailwindCSS. The management UI. Proxies API calls to the gateway on the same origin. Runs on port 3000.
Embedding Service — apps/embedding-service
Python / FastAPI. Generates embeddings for the semantic cache using paraphrase-multilingual-MiniLM-L12-v2. Runs on port 8080. Optional — only needed for semantic cache.
Data stores
PostgreSQL
Primary database. Stores organizations, teams, users, API keys, provider configs, alert rules, routing rules, and audit log. Managed via Drizzle ORM.
ClickHouse
Analytics database. Stores all LLM request events and agent task aggregations. Columnar storage for fast analytical queries over millions of rows.
Redis
Real-time budget tracking, rate limiting, exact-match cache, and alert cooldown tracking. Sub-millisecond operations.
Qdrant
Vector database for semantic cache. Stores request embeddings and finds similar requests by cosine similarity. Optional — only needed for semantic cache.