Anthropic just released Claude Sonnet 4.5, and they’re calling it “the best coding model in the world.” The new AI model beats both GPT-5 and Google’s Gemini 2.5 Pro on key programming benchmarks. Most impressive of all, this AI can code autonomously for more than 30 hours straight.
The announcement came on September 29, 2025, and it’s already shaking up the AI development world. Companies like Cursor, GitHub Copilot, and Canva are already seeing major improvements in their products.
Crushing the competition in coding tests
Claude Sonnet 4.5 scored 77.2% on SWE-bench Verified, the gold standard for measuring real-world coding abilities. This puts it ahead of GPT-5 at 72.8% and Gemini 2.5 Pro at 67.2%. When using advanced parallel computing, the score jumps to an incredible 82%.
SWE-bench Verified tests how well AI models can solve actual GitHub issues. These aren’t simple coding problems. They’re complex, real-world software bugs that human developers face every day.
“Sonnet 4.5 achieves 77.2% on SWE-bench Verified. It is state-of-the-art,” an Anthropic spokesperson confirmed.
30 hours of non-stop coding power
Here’s what separates Sonnet 4.5 from everything else. This AI model can maintain focus and work autonomously for more than 30 hours on complex tasks. Compare that to previous models that could only manage about 7 hours before losing coherence.
During testing, researchers watched Sonnet 4.5 build entire applications from scratch. It handled database setup, domain registration, and even SOC 2 audit steps. The AI never got confused or went off track during these marathon coding sessions.
Computer use just got a massive upgrade
On OSWorld, a benchmark that tests how well AI can use actual computer interfaces, Sonnet 4.5 scored 61.4%. Four months ago, the previous version only managed 42.2%. This is a huge leap forward.
What does this mean in practice? The AI can navigate websites, fill out spreadsheets, click buttons, and complete tasks just like a human would. Anthropic even released a Chrome extension that lets you watch Claude work directly in your browser.
Dianne Penn, head of product management at Anthropic, told The Verge that “the enhancements in the model’s computer usage capabilities exceeded her expectations.” She said Sonnet 4.5 is three times better at navigating web browsers and using computers compared to their technology from October 2024.
Real companies seeing real results
Early customers are already reporting dramatic improvements. Here’s what they’re saying:
A cybersecurity company saw a 44% reduction in vulnerability analysis time while improving accuracy by 25%. One development team reported going “from 9% error rate on Sonnet 4 to 0% on our internal code editing benchmark.”
“Claude Sonnet 4.5 amplifies GitHub Copilot’s core strengths. Our initial evals show significant improvements in multi-step reasoning and code comprehension,” GitHub Copilot teams noted.
Canva’s engineering team said “Claude Sonnet 4.5 delivers impressive gains on our most complex, long-context tasks. It’s noticeably more intelligent and a big leap forward, helping us push what 240M+ users can design.”
Math and reasoning get major boosts
Claude Sonnet 4.5 achieved 100% accuracy on the AIME 2025 mathematics exam when using Python tools. Without tools, it still managed 87%. On GPQA Diamond, a test for graduate-level physics knowledge, it scored 83.4%.
These aren’t just number-crunching improvements. The model shows “dramatically better domain-specific knowledge and reasoning” according to experts in finance, law, medicine, and STEM fields.
The Claude Agent SDK changes everything
Anthropic is releasing the Claude Agent SDK alongside Sonnet 4.5. This gives developers the same infrastructure that powers Claude Code, their AI coding assistant.
The SDK handles the hardest parts of building AI agents. It manages memory across long tasks, handles permission systems, and coordinates multiple sub-agents working together. Basically, you get six months of Anthropic’s engineering work for free.
Scott White, the product lead for Claude.ai, says the new model operates at a “chief-of-staff level.” It can find availability across multiple people’s calendars, schedule meetings, analyze data dashboards, and draft status updates based on meeting notes.
Safety improvements that actually matter
Claude Sonnet 4.5 is Anthropic’s “most aligned frontier model yet.” The company says they’ve dramatically reduced concerning behaviors like deception, power-seeking, and the tendency to encourage delusional thinking.
The model is being released under AI Safety Level 3 (ASL-3) protections. These include filters that detect potentially dangerous inputs, especially those related to chemical, biological, radiological, and nuclear weapons.
Anthropic has also made major progress defending against prompt injection attacks – one of the biggest risks when AI models can actually control computers and software.
Pricing stays the same despite massive improvements
Despite all these upgrades, Claude Sonnet 4.5 costs exactly the same as the previous version. Developers pay $3 per million input tokens and $15 per million output tokens.
This makes it more expensive than GPT-5 ($1.25 input, $10 output) but much cheaper than Claude Opus ($15 input, $75 output). For the performance you’re getting, many developers will find this a bargain.
Available everywhere right now
You can start using Claude Sonnet 4.5 immediately. It’s available through:
- The Claude.ai website and mobile apps
- The Claude API for developers
- Amazon Bedrock
- Google Cloud Vertex AI
- A new VS Code extension
- Chrome extension for Max subscribers
Anthropic recommends that everyone upgrade to Sonnet 4.5 regardless of how they’re currently using Claude.
New features make coding even better
Claude Code now includes checkpoints – one of the most requested features. These work like save points in a video game. You can instantly roll back to any previous state if something goes wrong.
The terminal interface got a complete refresh. Code execution and file creation now work directly in conversations. You can create spreadsheets, slides, and documents without leaving the chat.
What developers are saying
The early feedback from the development community has been overwhelmingly positive. One team building with the Devin AI coding assistant reported that “Claude Sonnet 4.5 increased planning performance by 18% and end-to-end eval scores by 12% – the biggest jump we’ve seen since the release of Claude Sonnet 3.6.”
Another development team said “Claude Sonnet 4.5 resets our expectations – it handles 30+ hours of autonomous coding, freeing our engineers to tackle months of complex architectural work in dramatically less time while maintaining coherence across massive codebases.”
The future of AI-powered development
Claude Sonnet 4.5 represents a major shift in what’s possible with AI coding assistants. We’re moving from tools that help you write code to tools that can build entire applications autonomously.
The ability to work for 30+ hours straight means these AI models can handle projects that would take human developers days or weeks. Combined with the Agent SDK, developers can build custom AI assistants tailored to their specific workflows.
For many companies, this could mean the difference between prototypes that sort of work and production-ready applications that actually ship. The gap between human and AI coding ability just got a lot smaller.