Model Leaderboard
Compare key performance metrics for LLM APIs.
Updated at: 1/30/2026, 6:02:02 AM
| Model | Provider | |||||||
|---|---|---|---|---|---|---|---|---|
Gemini 2.0 Flash google | Google Vertex | - | $0.10 | 0.32s | 83.0T/s | 99.12% | ||
Gemini 2.5 Flash google | Google Vertex | - | $0.32 | 3.60s | 126.0T/s | 99.74% | ||
Gemini 2.5 Flash Lite google | Google AI Studio | - | $0.10 | 0.75s | 98.0T/s | 99.51% | ||
Grok 4 Fast x-ai | xAI | - | $0.20 | 3.10s | 117.0T/s | 100.00% | ||
Claude Sonnet 4.5 anthropic | Amazon Bedrock | - | $3.12 | 1.75s | 102.0T/s | 99.90% | ||
Grok 4.1 Fast x-ai | xAI | - | $0.20 | 0.77s | 105.0T/s | 100.00% | ||
DeepSeek V3.2 deepseek | Google Vertex | - | $0.25 | 1.85s | 34.0T/s | 99.85% | ||
Gemini 3 Flash Preview google | Google AI Studio | - | $0.52 | 1.11s | 94.0T/s | 99.04% | ||
GPT-4o-mini openai | OpenAI | - | $0.15 | 0.48s | 32.0T/s | 99.99% | ||
Mistral Nemo mistralai | Mistral | - | $0.02 | 0.22s | 136.0T/s | 99.99% | ||
Llama 3.1 70B Instruct meta-llama | Together | - | $0.40 | 0.41s | 20.0T/s | 99.82% | ||
Llama 3.1 8B Instruct meta-llama | Friendli | - | $0.02 | 0.10s | 138.0T/s | 99.99% | ||
Ministral 3B mistralai | Mistral | - | $0.04 | 0.28s | 91.0T/s | - | ||
Llama 3.3 70B Instruct meta-llama | Cerebras | - | $0.10 | 0.31s | 355.5T/s | 99.87% | ||
Gemini 2.0 Flash Lite google | Google Vertex | - | $0.08 | 0.49s | 42.0T/s | 99.65% | ||
Gemma 3 27B google | Nebius Token Factory | - | $0.04 | 0.37s | 49.0T/s | 98.55% | ||
Gemma 3 4B google | Chutes | - | $0.02 | 0.94s | 34.0T/s | 99.99% | ||
DeepSeek V3 0324 deepseek | Baseten | - | $0.20 | 0.36s | 114.0T/s | 99.96% | ||
Llama 4 Maverick meta-llama | Groq | - | $0.15 | 0.33s | 231.0T/s | 99.98% | ||
GPT-4.1 Mini openai | Azure | - | $0.41 | 0.65s | 45.0T/s | 99.98% | ||
Qwen3 32B qwen | Cerebras | - | $0.08 | 0.40s | 664.5T/s | 99.74% | ||
Gemini 2.5 Pro google | Google Vertex (Global) | - | $1.33 | 2.65s | 92.0T/s | 98.50% | ||
DeepSeek R1T2 Chimera (free) tngtech | Chutes | - | $0.00 | 2.13s | 29.0T/s | 99.99% | ||
Qwen3 235B A22B Instruct 2507 qwen | Cerebras | - | $0.07 | 0.23s | 89.0T/s | 99.95% | ||
Qwen3 Coder 480B A35B qwen | DeepInfra (Turbo) | - | $0.23 | 0.31s | 110.0T/s | 99.97% | ||
GLM 4 32B z-ai | Z.ai | - | $0.10 | 0.64s | 5.0T/s | - | ||
gpt-oss-20b openai | Groq | - | $0.02 | 0.11s | 476.0T/s | 99.72% | ||
GPT-5 Nano openai | OpenAI | - | $0.05 | 0.78s | 113.0T/s | 99.97% | ||
GPT-5 Mini openai | Azure | - | $0.27 | 9.53s | 88.0T/s | 99.84% | ||
DeepSeek V3.1 deepseek | Google Vertex | - | $0.16 | 0.79s | 94.0T/s | 99.97% | ||
Grok Code Fast 1 x-ai | xAI | - | $0.21 | 0.80s | 136.0T/s | 100.00% | ||
Kimi K2 0905 moonshotai | Fireworks | - | $0.41 | 0.33s | 89.0T/s | 99.52% | ||
Gemini 2.5 Flash Lite Preview 09-2025 google | Google AI Studio | - | $0.10 | 0.58s | 191.0T/s | 99.76% | ||
Claude Haiku 4.5 anthropic | Google Vertex | - | $1.04 | 0.55s | 83.0T/s | 99.98% | ||
LFM2-8B-A1B liquid | Liquid | - | $0.01 | 0.37s | 28.0T/s | - | ||
GPT-5.1 Chat openai | Azure | - | $1.33 | 2.56s | 99.0T/s | 99.74% | ||
Claude Opus 4.5 anthropic | Google Vertex | - | $5.20 | 1.60s | 43.0T/s | 99.90% | ||
GPT-5.2 openai | Azure | - | $1.86 | 2.92s | 38.0T/s | 99.46% | ||
Kimi K2.5 moonshotai | GMICloud | - | $0.52 | 0.80s | 80.0T/s | 94.42% | ||
Trinity Large Preview (free) arcee-ai | Arcee AI | - | $0.00 | 0.59s | 39.0T/s | 92.00% | ||
Nova Micro 1.0 amazon | Amazon Bedrock | - | $0.04 | 0.36s | 195.0T/s | 28.00% | ||
Gemma 3 12B google | DeepInfra | - | $0.03 | 0.57s | 39.0T/s | 99.09% | ||
Llama 3.1 Nemotron Ultra 253B v1 nvidia | Nebius Token Factory | - | $0.61 | 0.13s | 21.0T/s | - | ||
GPT-4.1 openai | Azure | - | $2.06 | 0.76s | 50.0T/s | 99.98% | ||
Qwen3 8B qwen | Fireworks | - | $0.05 | 0.48s | 84.0T/s | 99.62% | ||
Qwen3 30B A3B qwen | Friendli | - | $0.06 | 0.10s | 125.0T/s | 97.37% | ||
Claude Sonnet 4 anthropic | Amazon Bedrock | - | $3.12 | 1.73s | 55.0T/s | 99.95% | ||
Grok 3 Mini x-ai | xAI Fast | - | $0.30 | 0.61s | 75.0T/s | 99.96% | ||
Mistral Small 3.2 24B mistralai | Mistral | - | $0.06 | 0.23s | 103.0T/s | 99.89% | ||
gpt-oss-120b (exacto) openai | Groq | - | $0.04 | 0.34s | 418.0T/s | 98.99% | ||
GPT-5 openai | Azure | - | $1.33 | 7.45s | 72.0T/s | 97.93% | ||
Qwen3 VL 8B Instruct qwen | Together | - | $0.08 | 0.27s | 94.0T/s | 99.90% | ||
Gemini 3 Pro Preview google | Google AI Studio | - | $2.10 | 3.31s | 81.0T/s | 98.01% | ||
Mistral Large 3 2512 mistralai | Mistral | - | $0.51 | 0.72s | 37.0T/s | 100.00% | ||
GLM 4.7 z-ai | Cerebras | - | $0.41 | 0.95s | 170.0T/s | 99.78% | ||
MiniMax M2.1 minimax | Fireworks | - | $0.28 | 0.98s | 99.5T/s | 99.52% |