API Keychain unifies Gemini, Groq, Cerebras, Mistral, DeepSeek and more behind a single OpenAI-compatible endpoint. It routes by effort tier, fails over on rate limits, and never makes you juggle keys again.
OpenAI SDK drop-in Keys encrypted at rest No card required
from openai import OpenAI
client = OpenAI(
base_url="https://api.apikeychain.dev/v1",
api_key="ak-•••••••••••••••",
)
# keychain-low | keychain-medium | keychain-high
resp = client.chat.completions.create(
model="keychain-high",
messages=[{"role": "user", "content": "Explain quantum tunneling."}],
)
print(resp.choices[0].message.content)Routing across the best free-tier inference networks
Everything you'd otherwise stitch together by hand — routing, failover, key management and observability — in one OpenAI-compatible gateway.
Ask for low, medium or high. The router cascades down a ranked list of models until one answers — quality when you need it, speed when you don't.
A 429 or outage on one provider transparently rolls to the next. Your request still completes.
Providers that just got throttled are parked in a cooldown window and skipped until they recover.
Every upstream provider key is sealed with authenticated encryption before it touches the database.
One revealable ak- key fronts everything. Rotate it instantly without touching upstream credentials.
Pin any model id a connected provider supports into a tier, then reorder priority to taste.
Per-model, per-provider request counts, token totals, success rate and latency — all in one dashboard.
No SDK swaps, no per-provider glue. Point the OpenAI client at your keychain URL and choose how hard you want the model to think.
Paste the free-tier keys you already have — Gemini, Groq, Cerebras and the rest. They're encrypted the moment they arrive.
Send keychain-low, -medium or -high as the model. The router builds an ordered cascade from your enabled models and preferences.
Failover, cooldowns and retries happen server-side. You get a clean OpenAI response and a full analytics trail.
All 59 free-tier models across 8 providers — reachable the moment you connect each provider's key.
Google's multimodal flagship — fast 2.0 Flash, deep 2.5 Pro.
LPU inference — Llama 3.3 70B at hundreds of tokens/sec.
Wafer-scale speed for Llama- and Qwen-class open models.
Efficient European frontier models, open-weight roots.
Reasoning-first models that rival closed frontier labs.
A meta-gateway — dozens of free community models in one slot.
Open-source model hosting at production scale.
Enterprise-grade Command models, OpenAI-compatible.
Each tier is an ordered cascade of real models. Reorder, disable or extend any of them from your dashboard — these are the live defaults.
Latency-optimized small models for autocomplete, classification and high-volume calls.
The everyday workhorse tier — balanced quality and speed for chat and agents.
Frontier reasoning for hard problems — R1, Gemini 2.5 Pro and Nemotron Ultra.
Swap the base URL and key. Keep your prompts, your SDK, your streaming, your tools. The model field becomes an effort tier — everything else is identical.
# A different SDK + key per provider…
from google import genai
from groq import Groq
from mistralai import Mistral
g = genai.Client(api_key=GEMINI_KEY)
q = Groq(api_key=GROQ_KEY)
m = Mistral(api_key=MISTRAL_KEY)
# …and you hand-roll the failover.Spin up your unified key in under a minute and route across every free-tier model from a single endpoint.