One API key for
every free AI model.

API Keychain unifies Gemini, Groq, Cerebras, Mistral, DeepSeek and more behind a single OpenAI-compatible endpoint. It routes by effort tier, fails over on rate limits, and never makes you juggle keys again.

OpenAI SDK drop-in Keys encrypted at rest No card required

app.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.apikeychain.dev/v1",
    api_key="ak-•••••••••••••••",
)

# keychain-low | keychain-medium | keychain-high
resp = client.chat.completions.create(
    model="keychain-high",
    messages=[{"role": "user", "content": "Explain quantum tunneling."}],
)
print(resp.choices[0].message.content)

Routing across the best free-tier inference networks

GGemini logo
Gemini
10 free models
GGroq logo
Groq
10 free models
CCerebras logo
Cerebras
5 free models
MMistral logo
Mistral
10 free models
DDeepSeek logo
DeepSeek
2 free models
OOpenRouter logo
OpenRouter
12 free models
TTogether logo
Together
3 free models
CCohere logo
Cohere
7 free models
GGemini logo
Gemini
10 free models
GGroq logo
Groq
10 free models
CCerebras logo
Cerebras
5 free models
MMistral logo
Mistral
10 free models
DDeepSeek logo
DeepSeek
2 free models
OOpenRouter logo
OpenRouter
12 free models
TTogether logo
Together
3 free models
CCohere logo
Cohere
7 free models
8
Inference providers
59
Free-tier models
3
Effort tiers
100%
OpenAI-compatible
The platform

A control plane for free AI inference

Everything you'd otherwise stitch together by hand — routing, failover, key management and observability — in one OpenAI-compatible gateway.

Effort-based routing

Ask for low, medium or high. The router cascades down a ranked list of models until one answers — quality when you need it, speed when you don't.

Automatic failover

A 429 or outage on one provider transparently rolls to the next. Your request still completes.

Rate-limit cooldowns

Providers that just got throttled are parked in a cooldown window and skipped until they recover.

Encrypted at rest

Every upstream provider key is sealed with authenticated encryption before it touches the database.

Unified keychain key

One revealable ak- key fronts everything. Rotate it instantly without touching upstream credentials.

Bring your own models

Pin any model id a connected provider supports into a tier, then reorder priority to taste.

Usage analytics

Per-model, per-provider request counts, token totals, success rate and latency — all in one dashboard.

How it works

From eight dashboards to one request

No SDK swaps, no per-provider glue. Point the OpenAI client at your keychain URL and choose how hard you want the model to think.

1

Connect your providers

Paste the free-tier keys you already have — Gemini, Groq, Cerebras and the rest. They're encrypted the moment they arrive.

2

Pick an effort tier

Send keychain-low, -medium or -high as the model. The router builds an ordered cascade from your enabled models and preferences.

3

Ship — we handle the rest

Failover, cooldowns and retries happen server-side. You get a clean OpenAI response and a full analytics trail.

keychain-high
effort: high
1
gemini-2.5-pro
Gemini
429 · skip
2
deepseek-r1
DeepSeek
cooling · skip
3
llama-3.3-70b
Groq
served
1 request in2 skipped · 1 served
The catalog

Every free model, one key

All 59 free-tier models across 8 providers — reachable the moment you connect each provider's key.

GGemini logo
Gemini
generativelanguage.googleapis.com
10 models

Google's multimodal flagship — fast 2.0 Flash, deep 2.5 Pro.

gemini-2.5-progemini-2.5-flashgemini-2.5-flash-litegemini-2.0-flashgemini-2.0-flash-litegemini-1.5-flashgemini-1.5-flash-8bgemma-3-27b-itgemma-3-12b-itgemma-3-4b-it
GGroq logo
Groq
api.groq.com
10 models

LPU inference — Llama 3.3 70B at hundreds of tokens/sec.

llama-3.3-70b-versatilellama-3.1-8b-instantllama-3.2-3b-previewllama-3.2-1b-previewllama3-70b-8192llama3-8b-8192gemma2-9b-itqwen-2.5-32bdeepseek-r1-distill-llama-70bmixtral-8x7b-32768
CCerebras logo
Cerebras
api.cerebras.ai
5 models

Wafer-scale speed for Llama- and Qwen-class open models.

llama-3.3-70bllama3.1-8bllama-4-scout-17b-16e-instructqwen-3-32bdeepseek-r1-distill-llama-70b
MMistral logo
Mistral
api.mistral.ai
10 models

Efficient European frontier models, open-weight roots.

mistral-small-latestmistral-large-latestopen-mistral-nemoopen-mistral-7bopen-mixtral-8x7bopen-mixtral-8x22bpixtral-12bcodestral-latestministral-8b-latestministral-3b-latest
DDeepSeek logo
DeepSeek
api.deepseek.com
2 models

Reasoning-first models that rival closed frontier labs.

deepseek-chatdeepseek-reasoner
OOpenRouter logo
OpenRouter
openrouter.ai
12 models

A meta-gateway — dozens of free community models in one slot.

deepseek/deepseek-r1:freedeepseek/deepseek-chat:freemeta-llama/llama-3.3-70b-instruct:freegoogle/gemini-2.0-flash-exp:freeqwen/qwen-2.5-72b-instruct:freeqwen/qwq-32b:freemistralai/mistral-nemo:freemistralai/mistral-7b-instruct:freenvidia/llama-3.1-nemotron-70b-instruct:freenousresearch/hermes-3-llama-3.1-405b:freeopenai/gpt-oss-120b:freeopenai/gpt-oss-20b:free
TTogether logo
Together
api.together.xyz
3 models

Open-source model hosting at production scale.

meta-llama/Llama-3.3-70B-Instruct-Turbo-Freemeta-llama/Llama-Vision-Freedeepseek-ai/DeepSeek-R1-Distill-Llama-70B-free
CCohere logo
Cohere
api.cohere.ai
7 models

Enterprise-grade Command models, OpenAI-compatible.

command-r-pluscommand-rcommand-r7bcommand-lightcommand-nightlyaya-expanse-32baya-expanse-8b
Effort tiers

Three knobs, the whole model landscape

Each tier is an ordered cascade of real models. Reorder, disable or extend any of them from your dashboard — these are the live defaults.

Low
keychain-low

Latency-optimized small models for autocomplete, classification and high-volume calls.

Cascade order
  • 1gemini-2.0-flash
  • 2groq/llama-3.1-8b-instant
  • 3cerebras/llama3.1-8b
  • 4openrouter/nvidia/llama-nemotron-nano-9b-v2:free
  • 5openrouter/google/gemma-4-26b-a4b:free
  • 6openrouter/openai/gpt-oss-20b:free
Medium
keychain-medium

The everyday workhorse tier — balanced quality and speed for chat and agents.

Cascade order
  • 1gemini-2.0-flash
  • 2groq/llama-3.3-70b-versatile
  • 3mistral-small-latest
  • 4openrouter/google/gemma-4-31b:free
  • 5openrouter/nvidia/nemotron-3-super:free
  • 6openrouter/cohere/north-mini-code:free
  • 7openrouter/openai/gpt-oss-120b:free
High
keychain-high

Frontier reasoning for hard problems — R1, Gemini 2.5 Pro and Nemotron Ultra.

Cascade order
  • 1gemini-2.5-pro
  • 2deepseek/deepseek-r1
  • 3groq/llama-3.3-70b-versatile
  • 4openrouter/nvidia/nemotron-3-ultra:free
  • 5openrouter/poolside/laguna-m.1:free
  • 6openrouter/poolside/laguna-xs.2:free
  • 7openrouter/tngtech/deepseek-r1t2-chimera:free
Quickstart

If you can call OpenAI, you're already done

Swap the base URL and key. Keep your prompts, your SDK, your streaming, your tools. The model field becomes an effort tier — everything else is identical.

  • Drop-in base URL & bearer key
  • Server-side routing & failover
  • Every call logged for analytics
before.py
# A different SDK + key per provider…
from google import genai
from groq import Groq
from mistralai import Mistral

g = genai.Client(api_key=GEMINI_KEY)
q = Groq(api_key=GROQ_KEY)
m = Mistral(api_key=MISTRAL_KEY)
# …and you hand-roll the failover.

Stop babysitting eight API keys.

Spin up your unified key in under a minute and route across every free-tier model from a single endpoint.