OpenAI2026-02-15

GPT-5.2: The GPQA King

The highest GPQA score in our benchmark (92.4%) and a strong SWE-bench showing make GPT-5.2 a serious contender, especially within the OpenAI ecosystem.

Top reasoning model with the best GPQA score and widest ecosystem support

Scores at a Glance

LLM-Stats Benchmarks

HLE

ARC-AGI v2

GPQA Diamond

Terminal-Bench 2.0

OSWorld-Verified

SWE-bench Verified

SWE-Bench Pro

LiveCodeBench Pro (2,393)

SciCode

APEX-Agents

GDPval-AA Elo (1,462)

τ2-bench Retail

τ2-bench Telecom

MCP Atlas

BrowseComp

MMMU-Pro

MMMLU

MRCR v2 (128K)

Speed (134)

The Real-World AI Benchmark (Scored)

Mobile App8.4/10

Reasoning Strength

GPT-5.2 posts the highest GPQA score in our benchmark at 92.4%, edging out even Gemini 3.1 Pro (94.3% came later). This translates to better performance on tasks that require deep technical reasoning, understanding complex business logic, and multi-step problem solving.

Ecosystem Advantage

GPT-5.2 benefits from the broadest integration ecosystem of any model. It works natively in Cursor, GitHub Copilot, ChatGPT, and hundreds of third-party tools. If your workflow depends on OpenAI-specific features, GPT-5.2 is a natural fit.

Coding Performance

With an 80.0% SWE-bench score and 1,521 Code Arena Elo, GPT-5.2 is solidly in the top tier for coding tasks. The 400K context window is the largest among non-Google models, useful for working with larger codebases.

Pricing

At $1.75/$14, GPT-5.2 offers strong value for its capability level. It is cheaper than both Claude models on input pricing while delivering competitive benchmark scores across the board.

Model Specs

Context Window400K

Input Price$1.75/M tokens

Output Price$14/M tokens

LicenseProprietary

Release Date2025-12-11

Related Reviews

Anthropic2026-02-20

Claude Opus 4.6: The Agentic Powerhouse

We tested Opus 4.6 across 50 real vibe coding tasks. It dominated multi-file refactors and complex agentic workflows, but the price tag limits casual use.

Read review

Google2026-02-20

Gemini 3.1 Pro: The Chat Arena Leader

The highest Chat Arena Elo (1,398) and top GPQA score (94.3%) with a 1M context window. A compelling full-stack option.

Read review

← View all benchmark results