Back

RouteLLM

Stable routing, fallback and quota control for multiple LLM providers

🚧

Coming soon

This product is currently under development and will be available soon.

Release: November 2025

RouteLLM is a routing layer that distributes requests across multiple LLM providers based on availability, cost, and latency.

It automatically switches providers when one is down or slow, balances load, tracks usage, and enforces quota limits across providers.

Problem

Managing multiple LLM providers requires handling different APIs, rate limits, and failover logic. This complexity grows with each provider added.

When to use

Use RouteLLM when you need to call multiple LLM providers (OpenAI, Anthropic, etc.) and want automatic failover, cost optimization, quota control, and unified API access. It eliminates the need to manage provider-specific code and handles rate limits automatically.

Input / Output

Input:Chat completion or text completion request
Output:Response from selected provider with automatic fallback on failure

Example

cURL example
curl -X POST https://routellm.kikuai.dev/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "messages": [{"role": "user", "content": "Hello"}],
    "model": "gpt-4o-mini"
  }'
JavaScript example
const response = await fetch('https://routellm.kikuai.dev/api/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    messages: [{ role: 'user', content: 'Hello' }],
    model: 'gpt-4o-mini'
  })
})
const data = await response.json()