Model Leaderboard
Compare key performance metrics for LLM APIs.
Updated at: 6/8/2026, 10:00:58 PM
| Model | Provider | |||||||
|---|---|---|---|---|---|---|---|---|
Gemini 2.5 Flash google | Google AI Studio | - | $0.32 | 0.83s | 72.0T/s | 99.53% | ||
Gemini 2.5 Flash Lite google | Google AI Studio | - | $0.10 | 0.35s | 119.0T/s | 99.68% | ||
gpt-oss-120b openai | Cerebras | - | $0.04 | 0.21s | 425.0T/s | 99.51% | ||
Gemini 3 Flash Preview google | Google Vertex | - | $0.52 | 1.35s | 63.0T/s | 99.96% | ||
DeepSeek V4 Flash deepseek | SiliconFlow | - | $0.10 | 1.68s | 77.0T/s | 99.30% | ||
GPT-4o-mini openai | OpenAI | - | $0.15 | 0.51s | 31.0T/s | 100.00% | ||
Mistral Nemo mistralai | Mistral | - | $0.02 | 0.32s | 78.0T/s | 99.24% | ||
Llama 3.1 8B Instruct meta-llama | Groq | - | $0.02 | 0.23s | 182.0T/s | 98.40% | ||
Llama 4 Maverick meta-llama | Google Vertex | - | $0.15 | 0.39s | 66.0T/s | 99.99% | ||
GPT-4.1 Nano openai | Azure | - | $0.10 | 1.67s | 42.0T/s | 100.00% | ||
Qwen3 235B A22B Instruct 2507 qwen | Weights & Biases | - | $0.09 | 0.31s | 89.0T/s | 99.69% | ||
GLM 4.5 Air z-ai | NovitaAI | - | $0.13 | 0.86s | 39.0T/s | 99.82% | ||
Claude Haiku 4.5 anthropic | Amazon Bedrock | - | $1.04 | 0.84s | 120.5T/s | 99.99% | ||
DeepSeek V3.2 deepseek | Friendli | - | $0.23 | 0.29s | 49.0T/s | 99.67% | ||
Claude Sonnet 4.6 anthropic | Google Vertex (Europe) | - | $3.12 | 2.89s | 76.0T/s | 99.98% | ||
Gemini 3.1 Flash Lite Preview google | Google AI Studio | - | $0.26 | 0.61s | 113.0T/s | 99.98% | ||
GPT-5.4 Mini openai | OpenAI | - | $0.79 | 0.65s | 69.0T/s | 99.82% | ||
Gemma 4 31B google | Weights & Biases | - | $0.12 | 0.34s | 40.0T/s | 98.80% | ||
Gemma 4 26B A4B google | Cloudflare | - | $0.06 | 0.39s | 77.5T/s | 99.89% | ||
Ling-2.6-flash inclusionai | NovitaAI | - | $0.01 | 0.83s | 44.0T/s | 18.30% | ||
MiMo-V2.5 xiaomi | Xiaomi | - | $0.14 | 2.11s | 61.0T/s | 86.40% | ||
Hy3 preview tencent | SiliconFlow | - | $0.06 | 3.09s | 54.0T/s | 99.76% | ||
DeepSeek V4 Pro deepseek | DeepSeek | - | $0.44 | 1.35s | 62.0T/s | 99.61% | ||
Anthropic Claude Sonnet Latest anthropic | Google Vertex (Europe) | - | $3.12 | 2.89s | 76.0T/s | - | ||
OpenAI GPT Mini Latest openai | OpenAI | - | $0.79 | 0.65s | 69.0T/s | 100.00% | ||
Anthropic Claude Haiku Latest anthropic | Amazon Bedrock | - | $1.04 | 0.84s | 120.5T/s | 100.00% | ||
Owl Alpha openrouter | Stealth | - | $0.00 | 5.27s | 14.0T/s | - | ||
Gemini 3.1 Flash Lite google | Google AI Studio | - | $0.26 | 0.73s | 108.0T/s | 99.97% | ||
MiniMax M3 minimax | MiniMax | - | $0.31 | 3.22s | 31.0T/s | 93.43% | ||
Qwen3.7 Plus qwen | Alibaba Cloud Int. | - | $0.41 | 0.76s | 10.0T/s | 52.00% | ||
Nemotron 3 Ultra (free) nvidia | NVIDIA | - | $0.00 | 6.28s | 9.0T/s | - | ||
Llama 3.1 70B Instruct meta-llama | Weights & Biases | - | $0.40 | 0.32s | 36.0T/s | 100.00% | ||
Llama 3.3 70B Instruct meta-llama | SambaNova Turbo | - | $0.10 | 0.38s | 119.0T/s | 100.00% | ||
Gemma 3 27B google | Phala | - | $0.08 | 0.84s | 27.0T/s | 99.71% | ||
DeepSeek V3 0324 deepseek | NovitaAI | - | $0.21 | 1.22s | 25.0T/s | 99.99% | ||
Llama 4 Scout meta-llama | Groq | - | $0.10 | 0.43s | 75.0T/s | 99.92% | ||
GPT-4.1 Mini openai | OpenAI | - | $0.41 | 0.75s | 34.0T/s | 99.99% | ||
GPT-4.1 openai | Azure | - | $2.06 | 0.85s | 45.0T/s | 100.00% | ||
Qwen3 32B qwen | Groq | - | $0.08 | 0.31s | 481.0T/s | 99.29% | ||
Mistral Small 3.2 24B mistralai | Mistral | - | $0.08 | 0.33s | 98.0T/s | 99.89% | ||
GLM 4 32B z-ai | Z.ai | - | $0.10 | 1.47s | 2.0T/s | 100.00% | ||
Qwen3 30B A3B Instruct 2507 qwen | AtlasCloud | - | $0.05 | 1.78s | 79.0T/s | 99.98% | ||
gpt-oss-20b openai | Weights & Biases | - | $0.03 | 0.27s | 280.0T/s | 98.63% | ||
GPT-5 Nano openai | OpenAI | - | $0.05 | 2.79s | 87.0T/s | 100.00% | ||
GPT-5 Mini openai | OpenAI | - | $0.27 | 3.31s | 67.0T/s | 99.16% | ||
DeepSeek V3.1 deepseek | SambaNova | - | $0.22 | 1.29s | 57.0T/s | 99.31% | ||
Kimi K2 0905 moonshotai | Groq | - | $0.62 | 0.18s | 180.0T/s | 100.00% | ||
DeepSeek V3.1 Terminus deepseek | DeepInfra | - | $0.28 | 0.71s | 30.0T/s | 100.00% | ||
Claude Sonnet 4.5 anthropic | Google Vertex (Global) | - | $3.12 | 1.55s | 43.0T/s | 99.99% | ||
Qwen3 VL 30B A3B Instruct qwen | Alibaba Cloud Int. | - | $0.13 | 0.49s | 62.0T/s | 99.87% | ||
Qwen3 VL 8B Instruct qwen | Alibaba Cloud Int. | - | $0.08 | 0.68s | 66.0T/s | 99.39% | ||
GPT-5.1 openai | OpenAI | - | $1.33 | 3.65s | 54.0T/s | 99.92% | ||
Ministral 3 3B 2512 mistralai | Mistral | - | $0.10 | 0.21s | 31.5T/s | 100.00% | ||
Ministral 3 8B 2512 mistralai | Mistral | - | $0.15 | 0.23s | 42.0T/s | 99.99% | ||
GPT-5.2 openai | OpenAI | - | $1.86 | 4.12s | 45.0T/s | 99.95% | ||
MiMo-V2-Flash xiaomi | Xiaomi | - | $0.10 | 0.59s | 57.0T/s | 100.00% | ||
Kimi K2.5 moonshotai | Venice | - | $0.42 | 1.11s | 86.0T/s | 99.96% | ||
Claude Opus 4.6 anthropic | Google Vertex | - | $5.20 | 1.06s | 49.0T/s | 99.99% | ||
MiniMax M2.5 minimax | MARA | - | $0.16 | 1.38s | 174.0T/s | 99.97% | ||
Gemini 3.1 Pro Preview google | Google AI Studio | - | $2.10 | 2.72s | 102.0T/s | 98.64% | ||
Qwen3.5-Flash qwen | Alibaba Cloud Int. | - | $0.07 | 0.63s | 82.0T/s | 96.00% | ||
GPT-5.4 openai | Azure | - | $2.62 | 3.90s | 49.0T/s | 99.97% | ||
Nemotron 3 Super (free) nvidia | NVIDIA | - | $0.00 | 5.32s | 7.0T/s | 12.00% | ||
GPT-5.4 Nano openai | Azure | - | $0.21 | 1.04s | 54.0T/s | 99.96% | ||
MiniMax M2.7 minimax | MARA | - | $0.29 | 0.85s | 235.0T/s | 98.74% | ||
GLM 5.1 z-ai | Friendli | - | $1.00 | 0.57s | 93.0T/s | 99.92% | ||
Claude Opus 4.7 anthropic | Google Vertex | - | $5.20 | 1.51s | 58.0T/s | 99.99% | ||
Kimi K2.6 moonshotai | Weights & Biases | - | $0.71 | 0.64s | 153.0T/s | 99.62% | ||
Claude Opus Latest anthropic | Google Vertex | - | $5.20 | 2.89s | 79.0T/s | - | ||
MiMo-V2.5-Pro xiaomi | DeepInfra | - | $0.44 | 1.08s | 79.0T/s | 99.90% | ||
GPT-5.5 openai | Azure | - | $5.24 | 4.68s | 45.0T/s | 98.68% | ||
OpenAI GPT Latest openai | Azure | - | $5.24 | 4.68s | 45.0T/s | - | ||
Google Gemini Flash Latest google | Google Vertex | - | $1.57 | 3.00s | 110.0T/s | - | ||
MoonshotAI Kimi Latest moonshotai | Weights & Biases | - | $0.71 | 0.64s | 153.0T/s | - | ||
Google Gemini Pro Latest google | Google AI Studio | - | $2.10 | 2.72s | 102.0T/s | - | ||
Laguna M.1 (free) poolside | Poolside | - | $0.00 | 3.43s | 10.0T/s | 100.00% | ||
Grok 4.3 x-ai | xAI | - | $1.27 | 0.75s | 147.0T/s | 100.00% | ||
Gemini 3.5 Flash google | Google Vertex | - | $1.57 | 3.00s | 110.0T/s | 99.76% | ||
Claude Opus 4.8 anthropic | Google Vertex | - | $5.20 | 2.89s | 79.0T/s | 99.99% | ||
Step 3.7 Flash stepfun | StepFun | - | $0.21 | 1.91s | 50.0T/s | 20.00% | ||
GPT-4o openai | OpenAI | - | $2.58 | 0.54s | 47.0T/s | 99.97% | ||
Qwen2.5 7B Instruct qwen | Together | - | $0.04 | 0.40s | 83.0T/s | 99.74% | ||
Claude 3.5 Haiku anthropic | Amazon Bedrock (US-WEST) | - | $0.83 | 0.82s | 34.0T/s | 99.94% | ||
DeepSeek V3 deepseek-ai | StreamLake | - | $0.21 | 0.86s | 25.0T/s | 99.97% | ||
Gemma 3 12B google | DeepInfra | - | $0.05 | 0.63s | 33.0T/s | 96.05% | ||
Qwen3 235B A22B qwen | Alibaba Cloud Int. | - | $0.47 | 0.49s | 60.0T/s | - | ||
Llama Guard 4 12B meta-llama | Together | - | $0.18 | 0.11s | 19.0T/s | 100.00% | ||
Claude Sonnet 4 anthropic | Google Vertex (Europe) | - | $3.12 | 0.50s | 46.0T/s | 100.00% | ||
Gemini 2.5 Pro google | Google AI Studio | - | $1.33 | 2.50s | 94.0T/s | 96.96% | ||
GPT-5 openai | OpenAI | - | $1.33 | 2.32s | 45.0T/s | 99.90% | ||
Gemini 2.5 Flash Lite Preview 09-2025 google | Google Vertex | - | $0.10 | 0.38s | 201.0T/s | 12.00% | ||
Nano Banana (Gemini 2.5 Flash Image) google | Google AI Studio | - | $0.32 | 5.71s | 169.0T/s | 99.99% | ||
Qwen3 VL 32B Instruct qwen | Alibaba Cloud Int. | - | $0.11 | 0.64s | 19.0T/s | 98.63% | ||
gpt-oss-safeguard-20b openai | Groq | - | $0.08 | 0.27s | 428.0T/s | 100.00% | ||
Mistral Large 3 2512 mistralai | Mistral | - | $0.51 | 0.57s | 3.0T/s | 88.00% | ||
Ministral 3 14B 2512 mistralai | Mistral | - | $0.20 | 0.26s | 52.0T/s | 99.38% | ||
Nemotron 3 Nano 30B A3B nvidia | DeepInfra | - | $0.05 | 1.33s | 105.0T/s | 4.00% | ||
GLM 4.7 z-ai | Cerebras | - | $0.41 | 0.51s | 448.0T/s | 99.63% | ||
GLM 5 z-ai | Friendli | - | $0.62 | 0.44s | 92.0T/s | 99.84% | ||
Qwen3.5 397B A17B qwen | Together | - | $0.41 | 0.48s | 96.0T/s | 99.94% | ||
LFM2-24B-A2B liquid | Together | - | $0.03 | 0.20s | 59.0T/s | - | ||
Nano Banana 2 (Gemini 3.1 Flash Image Preview) google | Google AI Studio | - | $0.52 | 11.56s | 131.0T/s | 99.98% | ||
Qwen3.5-9B qwen | Venice | - | $0.10 | 0.72s | 66.0T/s | 99.48% | ||
Qwen3.6 Plus qwen | Alibaba Cloud Int. | - | $0.34 | 1.15s | 38.0T/s | 64.00% | ||
Qwen3.6 27B qwen | SiliconFlow | - | $0.31 | 2.85s | 32.0T/s | 99.71% | ||
Qwen3.7 Max qwen | Alibaba Cloud Int. | - | $1.28 | 1.41s | 51.0T/s | 92.00% |