One API key for
every free AI model.

API Keychain unifies Gemini, Groq, Cerebras, Mistral, DeepSeek and more behind a single OpenAI-compatible endpoint. It routes by effort tier, fails over on rate limits, and never makes you juggle keys again.

OpenAI SDK drop-in Keys encrypted at rest No card required

app.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.apikeychain.dev/v1",
    api_key="ak-•••••••••••••••",
)

# keychain-low | keychain-medium | keychain-high
resp = client.chat.completions.create(
    model="keychain-high",
    messages=[{"role": "user", "content": "Explain quantum tunneling."}],
)
print(resp.choices[0].message.content)

Routing across the best free-tier inference networks

Gemini

10 free models

Groq

10 free models

Cerebras

5 free models

Mistral

10 free models

DeepSeek

2 free models

OpenRouter

12 free models

Together

3 free models

Cohere

7 free models

Gemini

10 free models

Groq

10 free models

Cerebras

5 free models

Mistral

10 free models

DeepSeek

2 free models

OpenRouter

12 free models

Together

3 free models

Cohere

7 free models

Inference providers

Free-tier models

Effort tiers

100%

OpenAI-compatible

The platform

A control plane for free AI inference

Everything you'd otherwise stitch together by hand — routing, failover, key management and observability — in one OpenAI-compatible gateway.

Effort-based routing

Ask for low, medium or high. The router cascades down a ranked list of models until one answers — quality when you need it, speed when you don't.

Automatic failover

A 429 or outage on one provider transparently rolls to the next. Your request still completes.

Rate-limit cooldowns

Providers that just got throttled are parked in a cooldown window and skipped until they recover.

Encrypted at rest

Every upstream provider key is sealed with authenticated encryption before it touches the database.

Unified keychain key

One revealable ak- key fronts everything. Rotate it instantly without touching upstream credentials.

Bring your own models

Pin any model id a connected provider supports into a tier, then reorder priority to taste.

Usage analytics

Per-model, per-provider request counts, token totals, success rate and latency — all in one dashboard.

How it works

From eight dashboards to one request

No SDK swaps, no per-provider glue. Point the OpenAI client at your keychain URL and choose how hard you want the model to think.

Connect your providers

Paste the free-tier keys you already have — Gemini, Groq, Cerebras and the rest. They're encrypted the moment they arrive.

Pick an effort tier

Send keychain-low, -medium or -high as the model. The router builds an ordered cascade from your enabled models and preferences.

Ship — we handle the rest

Failover, cooldowns and retries happen server-side. You get a clean OpenAI response and a full analytics trail.

keychain-high

effort: high

gemini-2.5-pro

Gemini

429 · skip

deepseek-r1

DeepSeek

cooling · skip

llama-3.3-70b

Groq

served

1 request in2 skipped · 1 served

The catalog

Every free model, one key

All 59 free-tier models across 8 providers — reachable the moment you connect each provider's key.

Gemini

generativelanguage.googleapis.com

10 models

Google's multimodal flagship — fast 2.0 Flash, deep 2.5 Pro.

gemini-2.5-progemini-2.5-flashgemini-2.5-flash-litegemini-2.0-flashgemini-2.0-flash-litegemini-1.5-flashgemini-1.5-flash-8bgemma-3-27b-itgemma-3-12b-itgemma-3-4b-it

Groq

api.groq.com

10 models

LPU inference — Llama 3.3 70B at hundreds of tokens/sec.

llama-3.3-70b-versatilellama-3.1-8b-instantllama-3.2-3b-previewllama-3.2-1b-previewllama3-70b-8192llama3-8b-8192gemma2-9b-itqwen-2.5-32bdeepseek-r1-distill-llama-70bmixtral-8x7b-32768

Cerebras

api.cerebras.ai

5 models

Wafer-scale speed for Llama- and Qwen-class open models.

llama-3.3-70bllama3.1-8bllama-4-scout-17b-16e-instructqwen-3-32bdeepseek-r1-distill-llama-70b

Mistral

api.mistral.ai

10 models

Efficient European frontier models, open-weight roots.

mistral-small-latestmistral-large-latestopen-mistral-nemoopen-mistral-7bopen-mixtral-8x7bopen-mixtral-8x22bpixtral-12bcodestral-latestministral-8b-latestministral-3b-latest

DeepSeek

api.deepseek.com

2 models

Reasoning-first models that rival closed frontier labs.

deepseek-chatdeepseek-reasoner

OpenRouter

openrouter.ai

12 models

A meta-gateway — dozens of free community models in one slot.

deepseek/deepseek-r1:freedeepseek/deepseek-chat:freemeta-llama/llama-3.3-70b-instruct:freegoogle/gemini-2.0-flash-exp:freeqwen/qwen-2.5-72b-instruct:freeqwen/qwq-32b:freemistralai/mistral-nemo:freemistralai/mistral-7b-instruct:freenvidia/llama-3.1-nemotron-70b-instruct:freenousresearch/hermes-3-llama-3.1-405b:freeopenai/gpt-oss-120b:freeopenai/gpt-oss-20b:free

Together

api.together.xyz

3 models

Open-source model hosting at production scale.

meta-llama/Llama-3.3-70B-Instruct-Turbo-Freemeta-llama/Llama-Vision-Freedeepseek-ai/DeepSeek-R1-Distill-Llama-70B-free

Cohere

api.cohere.ai

7 models

Enterprise-grade Command models, OpenAI-compatible.

command-r-pluscommand-rcommand-r7bcommand-lightcommand-nightlyaya-expanse-32baya-expanse-8b

Effort tiers

Three knobs, the whole model landscape

Each tier is an ordered cascade of real models. Reorder, disable or extend any of them from your dashboard — these are the live defaults.

Low

keychain-low

Latency-optimized small models for autocomplete, classification and high-volume calls.

Cascade order

1gemini-2.0-flash
2groq/llama-3.1-8b-instant
3cerebras/llama3.1-8b
4openrouter/nvidia/llama-nemotron-nano-9b-v2:free
5openrouter/google/gemma-4-26b-a4b:free
6openrouter/openai/gpt-oss-20b:free

Medium

keychain-medium

The everyday workhorse tier — balanced quality and speed for chat and agents.

Cascade order

1gemini-2.0-flash
2groq/llama-3.3-70b-versatile
3mistral-small-latest
4openrouter/google/gemma-4-31b:free
5openrouter/nvidia/nemotron-3-super:free
6openrouter/cohere/north-mini-code:free
7openrouter/openai/gpt-oss-120b:free

High

keychain-high

Frontier reasoning for hard problems — R1, Gemini 2.5 Pro and Nemotron Ultra.

Cascade order

1gemini-2.5-pro
2deepseek/deepseek-r1
3groq/llama-3.3-70b-versatile
4openrouter/nvidia/nemotron-3-ultra:free
5openrouter/poolside/laguna-m.1:free
6openrouter/poolside/laguna-xs.2:free
7openrouter/tngtech/deepseek-r1t2-chimera:free

Quickstart

If you can call OpenAI, you're already done

Swap the base URL and key. Keep your prompts, your SDK, your streaming, your tools. The model field becomes an effort tier — everything else is identical.

Drop-in base URL & bearer key
Server-side routing & failover
Every call logged for analytics

before.py

# A different SDK + key per provider…
from google import genai
from groq import Groq
from mistralai import Mistral

g = genai.Client(api_key=GEMINI_KEY)
q = Groq(api_key=GROQ_KEY)
m = Mistral(api_key=MISTRAL_KEY)
# …and you hand-roll the failover.

Stop babysitting eight API keys.

Spin up your unified key in under a minute and route across every free-tier model from a single endpoint.

One API key forevery free AI model.

A control plane for free AI inference

Effort-based routing

Automatic failover

Rate-limit cooldowns

Encrypted at rest

Unified keychain key

Bring your own models

Usage analytics

From eight dashboards to one request

Connect your providers

Pick an effort tier

Ship — we handle the rest

Every free model, one key

Three knobs, the whole model landscape

If you can call OpenAI, you're already done

Stop babysitting eight API keys.

One API key for
every free AI model.