Platform Documentation

Welcome to the Token Gateway documentation. Token Gateway is a centralized, OpenAI-compatible inference API proxy designed specifically for enterprise intranet environments. It abstracts the complexity of managing multiple upstream AI models, enforcing corporate policies, and tracking usage.

By dropping in Token Gateway as a replacement endpoint, teams can safely access AI capabilities without directly exposing raw vendor API keys or bypassing organizational compliance rules.

Authentication & Audit

Security starts at the edge. Token Gateway ensures that raw upstream vendor API keys (like OpenAI, DeepSeek, etc.) remain securely isolated within the server.

Single Gateway Key: Developers receive one downstream gateway API key per user, simplifying rotation and management.
Full Observability: Every request is logged in a centralized SQLite audit trail, providing visibility into who requested what, and when.
Role-Based Access: Users are organized into Tenants, ensuring strict isolation of resources and model visibility.

Smart Routing & Residency

Enterprises operate in a multi-region, multi-vendor world. Token Gateway handles the complexity of dispatching requests to the correct physical location and service provider.

Model Mapping: Transparently route public model names (e.g., gpt-4o-mini) to specific upstream vendor endpoints.
Data Residency Compliance: Enforce regional data restrictions. A tenant configured with EU-only residency will automatically block requests routed to US-based upstream providers.

Proactive AI Guardrails

To prevent data leaks and enforce compliance, Token Gateway integrates Content Filters directly into the inference pipeline.

Local Review Models: Connect your own local ML models to review user inputs before they leave your corporate network.
Monitor or Enforce: Run filters in stealth (Monitor) to track compliance failures, or in active mode (Enforce) to block non-compliant requests outright.
Tenant Policies: Apply different filter strictness levels to different departments or tenants within your organization.

Intelligent Token Budgets

Control cloud inference costs effortlessly using built-in budgeting mechanics.

Monthly Limits: Set monthly maximum token allocations per user to prevent runaway costs.
Accurate Estimation: Supports accurate token accounting by seamlessly parsing usage from upstream responses, estimating streamed content deltas, and gracefully handling prompt-only failure charges.

Built-in Management Console

Token Gateway ships with a lightning-fast, zero-dependency server-rendered dashboard.

Self-Service: Users can log in to view their active budgets, available models, and manage their personal API keys.
Admin Control: Administrators get full access to manage Tenants, Users, Vendor configurations, Route mapping, and system-wide Audit logs.
Interactive Analytics: View real-time usage metrics and charts directly in the browser.

100% API Compatibility

Integration requires zero code changes. Simply change the Base URL in your existing SDKs to point to the Token Gateway.

Core Routes: Full support for /v1/chat/completions, /v1/embeddings, and legacy endpoints.
Streaming: Perfect Server-Sent Events (SSE) proxying for real-time streaming experiences.
Built-in Protections: Automatic rate limiting, payload size restrictions, and idle streaming timeouts prevent system abuse.