Model Leaderboard
Compare key performance metrics for LLM APIs.
Updated at: 5/13/2026, 5:02:35 PM
| Model | Provider | |||||||
|---|---|---|---|---|---|---|---|---|
Gemini 2.5 Flash Lite google | Google AI Studio | - | $0.10 | 0.85s | 95.0T/s | 99.98% | ||
gpt-oss-120b openai | Cerebras | - | $0.04 | 0.26s | 567.0T/s | 99.50% | ||
GPT-4o-mini openai | OpenAI | - | $0.15 | 0.56s | 30.0T/s | 100.00% | ||
Mistral Nemo mistralai | DeepInfra | - | $0.02 | 0.34s | 44.0T/s | 99.91% | ||
Llama 3.1 8B Instruct meta-llama | Cerebras | - | $0.02 | 0.14s | 217.0T/s | 99.96% | ||
GPT-4.1 Mini openai | Azure | - | $0.41 | 0.57s | 46.0T/s | 99.97% | ||
Gemini 2.5 Flash google | Google Vertex (Global) | - | $0.32 | 0.94s | 59.0T/s | 99.56% | ||
Qwen3 235B A22B Instruct 2507 qwen | Google Vertex | - | $0.07 | 0.41s | 45.0T/s | 97.65% | ||
GLM 4.5 Air z-ai | NovitaAI | - | $0.14 | 0.90s | 25.0T/s | 99.08% | ||
Grok 4.1 Fast x-ai | xAI | - | $0.20 | 0.80s | 61.0T/s | 100.00% | ||
DeepSeek V3.2 deepseek | Alibaba Cloud Int. | - | $0.26 | 1.00s | 32.0T/s | 99.85% | ||
Gemini 3 Flash Preview google | Google AI Studio | - | $0.52 | 1.20s | 75.0T/s | 99.89% | ||
Claude Sonnet 4.6 anthropic | Anthropic | - | $3.12 | 1.23s | 56.0T/s | 99.90% | ||
Gemini 3.1 Flash Lite Preview google | Google Vertex | - | $0.26 | 1.11s | 98.0T/s | 99.73% | ||
Gemma 4 31B google | Venice | - | $0.12 | 1.06s | 42.0T/s | 63.37% | ||
Gemma 4 26B A4B google | Cloudflare | - | $0.06 | 0.37s | 49.0T/s | 99.87% | ||
Claude Opus 4.7 anthropic | Amazon Bedrock (US) | - | $5.20 | 1.68s | 64.0T/s | 98.61% | ||
Claude Opus Latest anthropic | Amazon Bedrock (US) | - | $5.20 | 1.17s | 73.0T/s | - | ||
Hy3 preview tencent | SiliconFlow | - | $0.07 | 4.71s | 33.0T/s | - | ||
DeepSeek V4 Flash deepseek | Alibaba Cloud Int. | - | $0.13 | 0.94s | 104.0T/s | 99.47% | ||
DeepSeek V4 Pro deepseek | Alibaba Cloud Int. | - | $0.44 | 0.73s | 48.0T/s | 99.53% | ||
Anthropic Claude Sonnet Latest anthropic | Anthropic | - | $3.12 | 1.23s | 56.0T/s | - | ||
Google Gemini Flash Latest google | Google AI Studio | - | $0.52 | 1.20s | 75.0T/s | - | ||
Google Gemini Pro Latest google | Google Vertex | - | $2.10 | 2.97s | 69.0T/s | - | ||
GPT-4o openai | OpenAI | - | $2.58 | 0.64s | 25.0T/s | 99.60% | ||
Llama 3.3 70B Instruct meta-llama | Groq | - | $0.10 | 0.34s | 125.0T/s | 100.00% | ||
DeepSeek V3 deepseek-ai | NovitaAI | - | $0.33 | 1.18s | 17.0T/s | 99.71% | ||
Gemini 2.0 Flash google | Google AI Studio | - | $0.10 | 0.70s | 55.0T/s | 70.89% | ||
Gemini 2.0 Flash Lite google | Google Vertex | - | $0.08 | 0.48s | 65.0T/s | 99.96% | ||
Gemma 3 27B google | Phala | - | $0.08 | 0.69s | 41.0T/s | 99.60% | ||
Gemma 3 12B google | Cloudflare | - | $0.04 | 0.29s | 39.0T/s | 100.00% | ||
DeepSeek V3 0324 deepseek | ModelRun | - | $0.21 | 1.47s | 28.0T/s | 99.89% | ||
Llama 4 Scout meta-llama | Groq | - | $0.08 | 0.36s | 182.0T/s | 99.86% | ||
Llama 4 Maverick meta-llama | Parasail | - | $0.15 | 0.69s | 85.0T/s | 99.69% | ||
GPT-4.1 Nano openai | Azure | - | $0.10 | 1.47s | 45.0T/s | 99.98% | ||
GPT-4.1 openai | Azure | - | $2.06 | 0.63s | 36.0T/s | 99.98% | ||
Qwen3 32B qwen | Groq | - | $0.08 | 0.26s | 389.5T/s | 99.93% | ||
Qwen3 8B qwen | Alibaba Cloud Int. | - | $0.05 | 0.37s | 83.0T/s | 99.95% | ||
Gemini 2.5 Pro google | Google Vertex (EU) | - | $1.33 | 2.53s | 102.0T/s | 99.55% | ||
Mistral Small 3.2 24B mistralai | Mistral | - | $0.08 | 0.39s | 54.0T/s | 99.64% | ||
GLM 4 32B z-ai | Z.ai | - | $0.10 | 1.68s | 2.0T/s | - | ||
gpt-oss-20b openai | Groq | - | $0.03 | 0.25s | 241.0T/s | 99.91% | ||
GPT-5 Nano openai | Azure | - | $0.05 | 3.86s | 84.0T/s | 99.94% | ||
GPT-5 Mini openai | OpenAI | - | $0.27 | 4.14s | 58.0T/s | 99.96% | ||
GPT-5 Chat openai | OpenAI | - | $1.33 | 0.75s | 91.0T/s | 96.00% | ||
DeepSeek V3.1 deepseek | SambaNova | - | $0.22 | 1.30s | 64.0T/s | 99.94% | ||
Qwen3 Next 80B A3B Instruct qwen | Google Vertex | - | $0.10 | 0.50s | 143.0T/s | 99.24% | ||
Grok 4 Fast x-ai | xAI | - | $0.20 | 5.08s | 91.0T/s | 96.00% | ||
Gemini 2.5 Flash Lite Preview 09-2025 google | Google Vertex | - | $0.10 | 2.77s | 75.0T/s | 76.00% | ||
Claude Sonnet 4.5 anthropic | Google Vertex | - | $3.12 | 0.98s | 38.0T/s | 99.74% | ||
Qwen3 VL 8B Instruct qwen | Alibaba Cloud Int. | - | $0.08 | 0.43s | 51.0T/s | 99.60% | ||
Claude Haiku 4.5 anthropic | Google Vertex (Europe) | - | $1.04 | 0.54s | 96.0T/s | 100.00% | ||
Qwen3 VL 32B Instruct qwen | Alibaba Cloud Int. | - | $0.11 | 1.70s | 28.0T/s | - | ||
GPT-5.1 Chat openai | OpenAI | - | $1.33 | 1.71s | 51.0T/s | 99.97% | ||
MiMo-V2-Flash xiaomi | NovitaAI | - | $0.10 | 1.51s | 32.0T/s | 98.70% | ||
GLM 4.7 Flash z-ai | DeepInfra | - | $0.06 | 0.35s | 63.0T/s | 98.07% | ||
Kimi K2.5 moonshotai | Venice | - | $0.42 | 0.98s | 57.0T/s | 99.74% | ||
Step 3.5 Flash stepfun | StepFun | - | $0.10 | 1.44s | 78.0T/s | 99.93% | ||
Qwen3 Coder Next qwen | Ionstream | - | $0.12 | 0.34s | 45.0T/s | 99.63% | ||
Claude Opus 4.6 anthropic | Google Vertex | - | $5.20 | 1.56s | 49.0T/s | 99.98% | ||
MiniMax M2.5 minimax | MARA | - | $0.16 | 0.73s | 290.0T/s | 99.92% | ||
Gemini 3.1 Pro Preview google | Google Vertex | - | $2.10 | 3.25s | 69.0T/s | 98.64% | ||
Qwen3.5-Flash qwen | Alibaba Cloud Int. | - | $0.07 | 0.52s | 104.0T/s | 72.00% | ||
GPT-5.4 openai | Azure | - | $2.62 | 1.67s | 44.0T/s | 99.91% | ||
Nemotron 3 Super (free) nvidia | NVIDIA | - | $0.00 | 17.14s | 18.0T/s | 11.11% | ||
GPT-5.4 Mini openai | OpenAI | - | $0.79 | 0.76s | 47.0T/s | 99.93% | ||
GPT-5.4 Nano openai | Azure | - | $0.21 | 1.58s | 45.0T/s | 100.00% | ||
MiniMax M2.7 minimax | Fireworks | - | $0.29 | 2.20s | 62.0T/s | 99.83% | ||
Qwen3.6 Plus qwen | Alibaba Cloud Int. | - | $0.34 | 2.19s | 37.0T/s | 100.00% | ||
GLM 5.1 z-ai | Friendli | - | $1.08 | 0.55s | 93.0T/s | 99.91% | ||
Kimi K2.6 moonshotai | Baseten | - | $0.77 | 1.14s | 232.0T/s | 99.85% | ||
GPT-5.5 openai | Azure | - | $5.24 | 4.96s | 49.0T/s | 99.86% | ||
OpenAI GPT Latest openai | Azure | - | $5.24 | 4.96s | 49.0T/s | - | ||
MoonshotAI Kimi Latest moonshotai | Baseten | - | $0.77 | 1.14s | 232.0T/s | - | ||
OpenAI GPT Mini Latest openai | OpenAI | - | $0.79 | 0.76s | 47.0T/s | - | ||
Anthropic Claude Haiku Latest anthropic | Google Vertex (Europe) | - | $1.04 | 0.54s | 96.0T/s | - | ||
Owl Alpha openrouter | Stealth | - | $0.00 | 13.41s | 7.0T/s | - | ||
Gemini 3.1 Flash Lite google | Google Vertex | - | $0.26 | 0.98s | 105.0T/s | 99.56% | ||
Ring-2.6-1T (free) inclusionai | NovitaAI | - | $0.00 | 3.71s | 45.0T/s | 95.99% | ||
Llama 3.1 70B Instruct meta-llama | Weights & Biases | - | $0.40 | 0.23s | 22.0T/s | 100.00% | ||
Qwen2.5 7B Instruct qwen | Phala | - | $0.04 | 0.41s | 52.0T/s | 99.99% | ||
Claude 3.5 Haiku anthropic | Amazon Bedrock | - | $0.83 | 0.87s | 46.0T/s | 99.91% | ||
Mistral Small 3 mistralai | DeepInfra | - | $0.05 | 0.23s | 44.0T/s | - | ||
Gemma 3 4B google | DeepInfra | - | $0.04 | 0.38s | 30.0T/s | 72.00% | ||
Llama Guard 4 12B meta-llama | Together | - | $0.18 | 0.13s | 18.0T/s | 77.94% | ||
Claude Sonnet 4 anthropic | Amazon Bedrock | - | $3.12 | 1.03s | 53.0T/s | 100.00% | ||
GPT-5 openai | OpenAI | - | $1.33 | 6.82s | 52.0T/s | 99.93% | ||
DeepSeek V3.1 Terminus deepseek | NovitaAI | - | $0.28 | 1.66s | 26.0T/s | 99.98% | ||
Qwen3 VL 235B A22B Instruct qwen | Alibaba Cloud Int. | - | $0.21 | 0.84s | 41.0T/s | 99.40% | ||
DeepSeek V3.2 Exp deepseek | AtlasCloud | - | $0.27 | 1.59s | 22.0T/s | 99.93% | ||
Qwen3 VL 30B A3B Instruct qwen | Alibaba Cloud Int. | - | $0.13 | 0.64s | 50.0T/s | 99.82% | ||
Nano Banana (Gemini 2.5 Flash Image) google | Google AI Studio | - | $0.32 | 0.84s | 194.0T/s | 99.97% | ||
GPT-5.1 openai | Azure | - | $1.33 | 1.22s | 47.0T/s | 99.96% | ||
Ministral 3 3B 2512 mistralai | Mistral | - | $0.10 | 0.16s | 54.0T/s | 100.00% | ||
GPT-5.2 openai | Azure | - | $1.86 | 2.53s | 30.0T/s | 99.89% | ||
Nemotron 3 Nano 30B A3B nvidia | DeepInfra | - | $0.05 | 1.14s | 67.0T/s | - | ||
GLM 5 z-ai | Baseten | - | $0.62 | 0.48s | 79.0T/s | 99.89% | ||
Qwen3.5-27B qwen | Phala | - | $0.21 | 0.81s | 99.0T/s | 99.97% | ||
Qwen3.5-35B-A3B qwen | Alibaba Cloud Int. | - | $0.15 | 0.86s | 114.0T/s | 99.89% | ||
Qwen3.5-9B qwen | Venice | - | $0.04 | 0.60s | 112.0T/s | 99.27% | ||
Mistral Small 4 mistralai | Venice | - | $0.15 | 0.67s | 125.0T/s | 99.39% | ||
Laguna M.1 (free) poolside | Poolside | - | $0.00 | 1.87s | 19.0T/s | - |