Back to Benchmark
OpenAI2026-02-15

GPT-5.2: The GPQA King

The highest GPQA score in our benchmark (92.4%) and a strong SWE-bench showing make GPT-5.2 a serious contender, especially within the OpenAI ecosystem.

Top reasoning model with the best GPQA score and widest ecosystem support

Scores at a Glance

LLM-Stats Benchmarks

35
HLE
46
HLE
53
ARC-AGI v2
92
GPQA Diamond
54
Terminal-Bench 2.0
38
OSWorld-Verified
80
SWE-bench Verified
56
SWE-Bench Pro
80
LiveCodeBench Pro (2,393)
52
SciCode
23
APEX-Agents
49
GDPval-AA Elo (1,462)
82
τ2-bench Retail
99
τ2-bench Telecom
61
MCP Atlas
66
BrowseComp
80
MMMU-Pro
90
MMMLU
84
MRCR v2 (128K)
67
Speed (134)

The Real-World AI Benchmark (Scored)

Mobile App8.4/10

Reasoning Strength

GPT-5.2 posts the highest GPQA score in our benchmark at 92.4%, edging out even Gemini 3.1 Pro (94.3% came later). This translates to better performance on tasks that require deep technical reasoning, understanding complex business logic, and multi-step problem solving.

Ecosystem Advantage

GPT-5.2 benefits from the broadest integration ecosystem of any model. It works natively in Cursor, GitHub Copilot, ChatGPT, and hundreds of third-party tools. If your workflow depends on OpenAI-specific features, GPT-5.2 is a natural fit.

Coding Performance

With an 80.0% SWE-bench score and 1,521 Code Arena Elo, GPT-5.2 is solidly in the top tier for coding tasks. The 400K context window is the largest among non-Google models, useful for working with larger codebases.

Pricing

At $1.75/$14, GPT-5.2 offers strong value for its capability level. It is cheaper than both Claude models on input pricing while delivering competitive benchmark scores across the board.

Model Specs

Context Window400K
Input Price$1.75/M tokens
Output Price$14/M tokens
LicenseProprietary
Release Date2025-12-11