Model Leaderboard
Compare key performance metrics for LLM APIs.
Updated at: 6/1/2025, 10:03:56 AM
Model | Provider | |||||||
---|---|---|---|---|---|---|---|---|
Gemini 1.5 Flash google | Google AI Studio | #80 | $0.08 | 0.30s | 152.8T/s | 99.95% | ||
GPT-4o-mini openai | Azure | #9 | $0.15 | 1.04s | 217.8T/s | 99.86% | ||
Mistral Nemo mistralai | Mistral | - | $0.02 | 0.26s | 137.3T/s | 99.97% | ||
Gemini 1.5 Flash 8B google | Google AI Studio | #97 | $0.04 | 0.22s | 206.5T/s | 97.32% | ||
Llama 3.3 70B Instruct meta-llama | Cerebras | #51 | $0.07 | 0.20s | 2446.5T/s | 99.95% | ||
Gemini 2.0 Flash google | Google Vertex | #18 | $0.10 | 0.50s | 165.3T/s | 99.96% | ||
Gemini 2.0 Flash Lite google | Google Vertex | #15 | $0.08 | 0.43s | 164.1T/s | 99.99% | ||
DeepSeek V3 0324 deepseek | SambaNova | #6 | $0.31 | 2.70s | 122.6T/s | 99.53% | ||
MythoMax 13B gryphe | Together | - | $0.07 | 0.42s | 140.1T/s | 99.98% | ||
Llama 3.1 70B Instruct meta-llama | Fireworks | - | $0.10 | 0.39s | 96.3T/s | 99.53% | ||
Llama 3.1 8B Instruct meta-llama | Cerebras | #125 | $0.02 | 0.30s | 3163.2T/s | 99.93% | ||
Llama 3.2 3B Instruct meta-llama | SambaNova | #141 | $0.01 | 0.33s | 1867.0T/s | 99.99% | ||
Mistral Small 3 mistralai | Mistral | #89 | $0.06 | 0.38s | 142.4T/s | 99.99% | ||
Gemma 3 27B google | Parasail | #18 | $0.10 | 1.11s | 71.0T/s | 99.87% | ||
Llama 4 Maverick meta-llama | Groq | #33 | $0.16 | 0.55s | 1134.6T/s | 99.99% | ||
GPT-4.1 Nano openai | OpenAI | #45 | $0.10 | 0.44s | 106.2T/s | 99.47% | ||
GPT-4.1 openai | OpenAI | #6 | $2.06 | 0.84s | 57.3T/s | 99.52% | ||
Gemini 2.5 Pro Preview google | Google AI Studio | #1 | $1.33 | 2.42s | 240.9T/s | 99.72% | ||
Claude Sonnet 4 anthropic | Google Vertex (Europe) | - | $3.12 | 1.57s | 82.9T/s | 98.55% | ||
Mixtral 8x7B Instruct mistralai | DeepInfra | #132 | $0.08 | 0.56s | 121.2T/s | 99.98% | ||
Mistral Tiny mistralai | Mistral | - | $0.25 | 0.29s | 130.7T/s | 100.00% | ||
Claude 3 Haiku anthropic | Google Vertex | #100 | $0.26 | 1.75s | 183.0T/s | 99.97% | ||
WizardLM-2 8x22B microsoft | Parasail | - | $0.50 | 1.11s | 75.6T/s | 99.93% | ||
GPT-4o openai | Azure | - | $2.58 | 2.88s | 116.2T/s | 97.16% | ||
Hermes 2 Pro - Llama-3 8B nousresearch | Lambda | - | $0.03 | 0.29s | 155.4T/s | 99.98% | ||
GPT-4o-mini (2024-07-18) openai | OpenAI | #52 | $0.15 | 0.44s | 70.1T/s | 99.92% | ||
Llama 3 8B Lunaris sao10k | NovitaAI | - | $0.02 | 0.81s | 86.0T/s | 100.00% | ||
Hermes 3 405B Instruct nousresearch | Lambda | - | $0.71 | 1.06s | 34.3T/s | 99.76% | ||
Hermes 3 70B Instruct nousresearch | Lambda | - | $0.12 | 0.50s | 48.6T/s | 99.99% | ||
Qwen2.5 72B Instruct qwen | Together | #67 | $0.12 | 0.80s | 100.5T/s | 99.97% | ||
Rocinante 12B thedrummer | Infermatic | - | $0.25 | 0.40s | 76.1T/s | 99.89% | ||
Ministral 8B mistralai | Mistral | #108 | $0.10 | 0.24s | 126.1T/s | 100.00% | ||
Claude 3.5 Sonnet anthropic | Google Vertex | - | $3.12 | 1.45s | 66.2T/s | 99.75% | ||
UnslopNemo 12B thedrummer | Infermatic | - | $0.45 | 0.52s | 94.9T/s | 99.92% | ||
Qwen2.5 Coder 32B Instruct qwen | Together | #88 | $0.06 | 0.82s | 68.2T/s | 99.96% | ||
GPT-4o (2024-11-20) openai | OpenAI | - | $2.58 | 0.46s | 68.7T/s | 99.83% | ||
DeepSeek V3 deepseek | Fireworks | #22 | $0.39 | 0.86s | 56.9T/s | 99.63% | ||
MiniMax-01 minimax | Minimax | - | $0.21 | 1.69s | 27.5T/s | 99.85% | ||
R1 deepseek | SambaNova | #9 | $0.47 | 4.62s | 112.2T/s | 99.68% | ||
R1 Distill Llama 70B deepseek | Cerebras | - | $0.10 | 0.45s | 2415.5T/s | 99.90% | ||
LFM 3B liquid | Liquid | - | $0.02 | 0.98s | 20.1T/s | 99.79% | ||
LFM 7B liquid | Lambda | - | $0.01 | 0.42s | 108.7T/s | 99.99% | ||
Skyfall 36B V2 thedrummer | Parasail | - | $0.51 | 0.90s | 40.9T/s | 99.98% | ||
Gemma 3 4B google | DeepInfra | #63 | $0.02 | 0.33s | 81.2T/s | 93.28% | ||
Llama 4 Scout meta-llama | Cerebras | #51 | $0.08 | 0.26s | 2369.5T/s | 98.98% | ||
Grok 3 Beta x-ai | xAI Fast | #6 | $3.12 | 0.71s | 59.1T/s | 99.85% | ||
Grok 3 Mini Beta x-ai | xAI Fast | - | $0.30 | 0.35s | 184.7T/s | 99.96% | ||
GPT-4.1 Mini openai | OpenAI | #15 | $0.41 | 0.69s | 62.9T/s | 99.52% | ||
o4 Mini openai | OpenAI | #6 | $1.14 | 5.43s | 165.0T/s | 99.53% | ||
Qwen3 235B A22B qwen | Fireworks | #17 | $0.14 | 0.83s | 81.0T/s | 96.41% | ||
Qwen3 14B qwen | Nebius AI Studio | - | $0.07 | 5.30s | 329.8T/s | 98.43% | ||
Gemini 2.5 Flash Preview 05-20 (thinking) google | Vertex Thinking | #3 | $0.18 | 1.76s | 140.1T/s | 99.19% | ||
R1 0528 deepseek | Baseten | #9 | $0.52 | 0.40s | 132.0T/s | 99.59% | ||
GPT-3.5 Turbo openai | OpenAI | #123 | $0.51 | 0.33s | 52.7T/s | 99.74% | ||
ReMM SLERP 13B undi95 | Mancer (private) | - | $0.81 | 0.72s | 41.6T/s | 99.99% | ||
Mistral Large mistralai | Mistral | #55 | $2.05 | 0.46s | 45.3T/s | 99.26% | ||
Gemini 1.5 Pro google | Google AI Studio | #49 | $1.29 | 0.59s | 75.3T/s | 99.92% | ||
Llama 3 70B Instruct meta-llama | Groq | #76 | $0.30 | 0.21s | 406.4T/s | 99.98% | ||
Llama 3 8B Instruct meta-llama | Groq | #119 | $0.03 | 0.42s | 3705.9T/s | 99.66% | ||
Mistral 7B Instruct mistralai | Together | #158 | $0.03 | 0.37s | 208.5T/s | 99.97% | ||
Gemma 2 9B google | Groq | #97 | $0.20 | 0.41s | 867.0T/s | 14.62% | ||
Llama 3.1 405B Instruct meta-llama | SambaNova | - | $0.81 | 2.46s | 103.4T/s | 100.00% | ||
ChatGPT-4o openai | OpenAI | #2 | $5.12 | 0.48s | 93.4T/s | 99.69% | ||
Llama 3.1 Euryale 70B v2.2 sao10k | DeepInfra | - | $0.71 | 0.42s | 38.6T/s | 99.98% | ||
Command R (08-2024) cohere | Cohere | #98 | $0.15 | 0.28s | 48.0T/s | 99.82% | ||
Lumimaid v0.2 8B neversleep | Mancer (private) | - | $0.21 | 0.78s | 57.2T/s | 98.94% | ||
Llama 3.2 1B Instruct meta-llama | SambaNova | #167 | $0.01 | 0.26s | 7692.3T/s | 99.75% | ||
Qwen2.5 7B Instruct qwen | Together | - | $0.04 | 0.40s | 188.9T/s | 100.00% | ||
Ministral 3B mistralai | Mistral | - | $0.04 | 0.18s | 231.8T/s | 99.99% | ||
Claude 3.5 Haiku anthropic | Google Vertex | - | $0.83 | 2.03s | 76.0T/s | 98.87% | ||
Gemini 2.0 Flash Experimental (free) google | Google AI Studio | #15 | $0.00 | 0.45s | 167.3T/s | 37.01% | ||
Grok 2 Vision 1212 x-ai | xAI | - | $2.08 | 0.86s | 77.4T/s | 99.94% | ||
Llama 3.3 Euryale 70B sao10k | Infermatic | - | $0.71 | 0.62s | 49.3T/s | 99.97% | ||
Phi 4 microsoft | Nebius AI Studio | #100 | $0.07 | 0.20s | 119.6T/s | 99.23% | ||
Codestral 2501 mistralai | Mistral | - | $0.31 | 0.27s | 172.7T/s | 99.71% | ||
o3 Mini openai | OpenAI | #26 | $1.14 | 7.67s | 470.8T/s | 99.35% | ||
Qwen-Turbo qwen | Alibaba | - | $0.05 | 0.52s | 107.9T/s | 99.88% | ||
Claude 3.7 Sonnet (thinking) anthropic | Anthropic | #14 | $3.12 | 1.68s | 54.9T/s | 96.62% | ||
QwQ 32B qwen | Groq | #142 | $0.15 | 0.43s | 571.4T/s | 99.49% | ||
Gemma 3 12B google | Cloudflare | #26 | $0.05 | 0.30s | 75.8T/s | 99.15% | ||
Mistral Small 3.1 24B mistralai | Mistral | #63 | $0.05 | 0.24s | 78.5T/s | 99.85% | ||
Llama 3.3 Nemotron Super 49B v1 nvidia | Nebius AI Studio | #33 | $0.13 | 1.39s | 45.4T/s | 91.92% | ||
o3 openai | OpenAI | #1 | $10.32 | 5.55s | 220.8T/s | 99.73% | ||
o4 Mini High openai | OpenAI | - | $1.14 | 5.43s | 1030.6T/s | 98.55% | ||
Gemini 2.5 Flash Preview 04-17 (thinking) google | AI Studio Thinking | #6 | $0.18 | 1.33s | 182.6T/s | 99.18% | ||
GLM Z1 32B (free) thudm | Chutes | - | $0.00 | 1.77s | 50.4T/s | 94.57% | ||
MAI DS R1 (free) microsoft | Chutes | - | $0.00 | 1.28s | 71.2T/s | 99.75% | ||
DeepSeek R1T Chimera (free) tngtech | Chutes | - | $0.00 | 1.84s | 62.4T/s | 99.74% | ||
Qwen3 32B qwen | Cerebras | #24 | $0.10 | 0.93s | 1806.5T/s | 98.99% | ||
Qwen3 30B A3B qwen | Parasail | #49 | $0.08 | 0.69s | 143.9T/s | 99.61% | ||
DeepSeek Prover V2 (free) deepseek | Chutes | - | $0.00 | 1.47s | 62.8T/s | 99.64% | ||
Mistral Medium 3 mistralai | Mistral | #13 | $0.42 | 0.76s | 82.7T/s | 99.92% | ||
Claude Opus 4 anthropic | Google Vertex | - | $15.60 | 2.05s | 46.5T/s | 97.79% | ||
GPT-3.5 Turbo 16k openai | OpenAI | #123 | $0.51 | 0.44s | 132.5T/s | 99.95% | ||
Dolphin 2.9.2 Mixtral 8x22B 🐬 cognitivecomputations | NovitaAI | - | $0.91 | 1.93s | 13.6T/s | 99.94% | ||
Claude 3.5 Sonnet (2024-06-20) anthropic | Google Vertex | #29 | $3.12 | 2.31s | 113.1T/s | 98.48% | ||
GPT-4o (2024-08-06) openai | Azure | #38 | $2.58 | 0.92s | 148.7T/s | 99.30% | ||
Pixtral 12B mistralai | Mistral | - | $0.10 | 0.49s | 74.8T/s | 99.97% | ||
Llama 3.1 Nemotron 70B Instruct nvidia | Together | #68 | $0.12 | 0.61s | 80.1T/s | 99.91% | ||
Claude 3.5 Haiku (2024-10-22) anthropic | Google Vertex | #52 | $0.83 | 2.24s | 59.4T/s | 99.71% | ||
Mistral Large 2411 mistralai | Mistral | #67 | $2.05 | 0.55s | 48.6T/s | 99.94% | ||
Nova Pro 1.0 amazon | Amazon Bedrock | #76 | $0.83 | 0.60s | 110.8T/s | 96.00% | ||
Nova Micro 1.0 amazon | Amazon Bedrock | #108 | $0.04 | 0.27s | 275.5T/s | 91.84% | ||
Nova Lite 1.0 amazon | Amazon Bedrock | #95 | $0.06 | 0.47s | 149.1T/s | 96.00% | ||
Grok 2 1212 x-ai | xAI | - | $2.08 | 0.29s | 78.8T/s | 99.78% | ||
Sonar perplexity | Perplexity | - | $1.01 | 2.07s | 100.4T/s | 99.95% | ||
Qwen2.5 VL 72B Instruct (free) qwen | Chutes | - | $0.00 | 2.25s | 68.7T/s | 92.85% | ||
R1 Distill Llama 8B deepseek | NovitaAI | - | $0.04 | 1.48s | 41.8T/s | 99.84% | ||
DeepSeek R1 Zero (free) deepseek | Chutes | - | $0.00 | 1.09s | 72.7T/s | 99.71% | ||
Sonar Pro perplexity | Perplexity | - | $3.12 | 2.38s | 61.8T/s | 99.98% | ||
Anubis Pro 105B V1 thedrummer | Parasail | - | $0.81 | 1.09s | 26.5T/s | 79.41% | ||
GPT-4o-mini Search Preview openai | OpenAI | - | $0.15 | 1.94s | 212.1T/s | 98.87% | ||
DeepSeek V3 Base (free) deepseek | Chutes | - | $0.00 | 1.21s | 73.8T/s | 99.05% | ||
Llama 3.1 Nemotron Ultra 253B v1 (free) nvidia | Chutes | - | $0.00 | 1.66s | 28.2T/s | 99.35% | ||
GLM 4 32B (free) thudm | Chutes | - | $0.00 | 6.23s | 51.4T/s | 95.06% | ||
Qwen3 8B qwen | NovitaAI | - | $0.04 | 0.62s | 54.0T/s | 95.81% | ||
DeepHermes 3 Mistral 24B Preview (free) nousresearch | Chutes | - | $0.00 | 1.02s | 224.3T/s | 98.73% | ||
Llama 3.3 8B Instruct (free) meta-llama | Meta | - | $0.00 | 0.48s | 236.7T/s | 95.44% | ||
Devstral Small (free) mistralai | Chutes | - | $0.00 | 1.14s | 101.2T/s | 97.52% | ||
Deepseek R1 0528 Qwen3 8B deepseek | NovitaAI | - | $0.06 | 0.82s | 63.8T/s | 98.88% | ||
R1 Distill Qwen 7B deepseek | GMICloud | - | $0.10 | 16.47s | 144.8T/s | 98.56% |