Updated monthly · May 2026 edition

The best AI model for every task

We test the major frontier models every month and pick the one we'd actually reach for, by task. No fence-sitting. No "it depends" without saying what it depends on. If a model is overkill for the job we tell you to use the cheaper one, even when it costs us nothing to recommend the more expensive one.

How we choose

We use these models daily to run Smash Your AI, write our blog, build our course, automate our work, and consult for clients. The recommendations below come from real jobs we've shipped, not from benchmark tables. Where we disagree with the conventional wisdom, we say so and explain why.

If you only read one section

For most professional writing

Claude Sonnet 4.6

Cleanest prose, lowest hallucination rate, best at matching a tone you've supplied.

For most code work

Claude Code (Opus 4.7)

By a margin. The agentic loop is the difference between "wrote a function" and "shipped a feature".

For most images

Midjourney v7

Sora and Imagen 3 are catching up but Midjourney still sets the look.

For everything cheap and fast

GPT-4o mini

Bulk classification, basic Q&A, embedded chatbots. Pennies per 1,000 calls.

Full guide by task

20 common tasks. Our pick, why we picked it, what we'd swap to if we couldn't, and where the conventional wisdom is wrong.

Long-form professional writing (blog posts, reports, briefs)

OUR PICK

Claude Sonnet 4.6

Best balance of voice control, hallucination rate, and the quality of its first draft. We've never found a model that follows a "write in this voice, with these examples" instruction more faithfully.

Free alternative: Claude.ai's free tier gives you Sonnet for a few messages a day. After that it falls back to Haiku, which is genuinely good for shorter pieces.

Conventional wisdom says: GPT-5 is the best writer. Our take: GPT-5 is more confident-sounding but less honest. Claude will tell you when an example is missing or a claim is shaky. GPT-5 won't, and you'll ship the hallucination.

Writing and shipping code

OUR PICK

Claude Code with Opus 4.7

Not just the model — the whole agentic environment. Claude Code can read your repo, run your tests, debug its own work, and stop when something is genuinely ambiguous. Cursor with Sonnet is a close second if you want a more traditional IDE feel.

Free alternative: Cursor's free tier with GPT-4o is fine for one-file scripts. Anything that touches more than 3 files, you'll feel the difference.

Conventional wisdom says: Copilot is the safe default. Our take: Copilot autocompletes. Claude Code thinks. They're different tools. If your job is "ship features", you want Claude Code.

Deep research (multi-source, citations needed)

OUR PICK

ChatGPT Deep Research mode

For a job that needs 10+ sources synthesised with traceable citations, OpenAI's Deep Research is the only product where we trust the citations to actually match the claims. NotebookLM is brilliant for sources you've already curated, but won't go and find more.

Free alternative: Perplexity Pro free trial. Quality drops on the third or fourth iteration but the first run is decent.

Conventional wisdom says: Claude is the most accurate so it should win at research. Our take: accuracy ≠ research depth. Claude won't go and dig for fresh sources. ChatGPT will, with caveats.

Summarising long documents

OUR PICK

Claude Haiku 4.5

Fast, cheap, and surprisingly good. We pipe legal docs, customer interview transcripts, and research papers through it daily. Sonnet is better but you usually don't need it for summarisation.

Free alternative: Claude.ai free tier has Haiku quota. Or paste into NotebookLM if it's a single doc you want to query repeatedly.

Conventional wisdom says: Use the most capable model. Our take: for summarisation specifically, the cheapest model from the leading lab is the right call. The marginal quality gain isn't worth 10x the price.

Image generation

OUR PICK

Midjourney v7

Still ahead on aesthetic quality, especially for product, lifestyle, and brand imagery. Imagen 3 is sharper for hyper-realism. ChatGPT's image gen is the easiest to use but the least controllable.

Free alternative: Bing Image Creator (DALL-E 3) for casual use. Quality is good for marketing visuals.

Conventional wisdom says: Sora is the new state of the art. Our take: Sora's killer app is video, not stills. Midjourney still wins on stills.

Voice generation (text to speech)

OUR PICK

ElevenLabs v3

Best voice cloning, most natural emotional delivery, biggest voice library. We use it for course narration and podcast production.

Free alternative: ElevenLabs free tier (10,000 chars/month). Enough for a couple of short podcasts.

Conventional wisdom says: OpenAI's TTS is fine. Our take: OpenAI's TTS is fine for app interfaces. For anything where voice quality is the product, ElevenLabs.

Transcription (audio to text)

OUR PICK

Whisper Large v3 (via Groq)

Whisper is the model. Groq is the host that makes it fast and cheap. £0.0003 per minute of audio with sub-second latency.

Free alternative: Whisper local (run on your Mac with whisper.cpp). Slower but private and free.

Conventional wisdom says: Otter.ai is the productivity tool. Our take: Otter is great for live meeting transcription. For batch transcription of recorded files, Whisper-via-Groq is faster and 30x cheaper.

Hard reasoning (maths, logic puzzles, complex planning)

OUR PICK

o3-pro or Claude Opus 4.7 with extended thinking

o3-pro for pure-maths and logic problems. Opus 4.7 with extended thinking for anything that mixes reasoning with judgment (e.g. "design this system"). The two models think differently.

Free alternative: ChatGPT free tier with o3-mini.

Conventional wisdom says: Always use the reasoning model. Our take: reasoning models are 10-30x more expensive and slower. For 80% of tasks the regular Sonnet/GPT-5 is enough. Reach for reasoning only when you've seen a non-reasoning model fail.

Bulk classification, tagging, extraction

OUR PICK

GPT-4o mini

£0.15 per million input tokens, £0.60 per million output. We process tens of thousands of customer emails through it for sentiment + intent tagging. Claude Haiku is comparable on quality but slightly more expensive.

Free alternative: Gemini Flash 2.0 (Google AI Studio free tier). Generous quota.

Conventional wisdom says: Use GPT-4o or Sonnet for accuracy. Our take: for well-defined classification (e.g. "is this a complaint, a question, or a compliment"), the mini models are 95% as accurate at 5% of the cost.

Reading images, screenshots, charts, scanned docs

OUR PICK

Claude Sonnet 4.6 vision

Best at extracting structured data from messy inputs (receipts, screenshots, hand-drawn diagrams). Gemini 2.5 Pro is comparable on charts and graphs specifically.

Free alternative: Gemini's free tier includes vision.

Conventional wisdom says: GPT-4o is the all-rounder. Our take: GPT-4o is fine, but Sonnet hallucinates less when the input is ambiguous. For anything you'll downstream into a system, Sonnet is the safer pick.

Retrieval-augmented generation (chat-with-your-docs)

OUR PICK

Claude Sonnet 4.6 with prompt caching

Long context (200k tokens), low hallucination rate, and prompt caching makes it ~10x cheaper than naive RAG when you query the same docs repeatedly.

Free alternative: NotebookLM. Best in class for non-developers and free.

Conventional wisdom says: You need a vector database. Our take: for collections under 500 documents, just stuff them in the context window and use prompt caching. Vector DBs are needed at scale, not from day one.

Building autonomous agents

OUR PICK

Claude Sonnet 4.6 via the Anthropic SDK

Best tool-use accuracy, best at recognising when a job is done, best at asking for help vs guessing. The other models will run in circles.

Free alternative: None. If you want to build production agents you're paying.

Conventional wisdom says: Use a framework like LangChain or CrewAI. Our take: frameworks add abstraction debt. For under 5 agents, just use the SDK directly. You'll learn more.

Inline coding suggestions (IDE autocomplete)

OUR PICK

Cursor with Sonnet 4.6

Best inline experience. Tab-to-accept feels native, the model knows when to suggest, and the per-file context is right by default.

Free alternative: Cursor's free tier has limited Sonnet calls. After that, GPT-4o autocomplete is fine.

Conventional wisdom says: Copilot is in VS Code, just use that. Our take: GitHub Copilot is great if you live in VS Code. If you'll move to Cursor anyway for the agentic mode, you don't need both.

Marketing copy (emails, ads, social posts)

OUR PICK

Claude Sonnet 4.6 with examples in the prompt

Same model as long-form, different setup. Always paste 3-5 of your best past pieces as examples. The brief should be one paragraph, not a checklist.

Free alternative: Claude free tier. Or Mistral Large 3 if you want a non-US-hosted option.

Conventional wisdom says: ChatGPT is the marketer's friend. Our take: GPT-5 is sharper on punchy headlines, Claude is sharper on a coherent voice across 5+ pieces. If you're producing in series, Claude wins.

Translation

OUR PICK

DeepL for European languages, Gemini 2.5 Pro for everything else

DeepL still wins on European pairs (English↔French, German, Spanish, Italian, Dutch). For Mandarin, Japanese, Arabic, Hindi and the long tail, Gemini 2.5 Pro is now ahead of GPT-5 and Claude.

Free alternative: DeepL free tier is generous. Google Translate for the long tail if you don't want to use Gemini.

Conventional wisdom says: One model for everything. Our take: translation is one of the few jobs where specialised tools (DeepL) still beat general LLMs on the languages they cover.

Spreadsheet formulas, data cleaning, analysis

OUR PICK

ChatGPT with Code Interpreter

Upload the spreadsheet, ask in plain English, get a chart or a clean CSV back. Claude can write the formula but ChatGPT will run it on your data and show you the result.

Free alternative: ChatGPT free tier includes Code Interpreter for limited use.

Conventional wisdom says: Excel Copilot is built in. Our take: Excel Copilot is impressive but lags ChatGPT's Code Interpreter on anything multi-step. Worth using both.

Slide decks and pitch decks

OUR PICK

Claude Sonnet for content, Gamma for design

Use Claude to write the deck structure and slide-by-slide copy in markdown. Paste it into Gamma to auto-design. Better than any single tool's "make me a deck" feature.

Free alternative: Both have free tiers. Gamma's free tier limits exports; the design quality is full.

Conventional wisdom says: Just use Gamma. Our take: Gamma's auto-content is generic. Pair it with a real LLM for the words and you're 80% of the way to a deck a designer would respect.

Short video (under 60s)

OUR PICK

Sora 2

Best at coherent motion, lighting, and physics. Veo 3 from Google is comparable, sometimes better at human faces.

Free alternative: Runway Gen-3's free tier for short clips. Quality is a notch below Sora.

Conventional wisdom says: Wait for the prices to come down. Our take: don't wait. Even at current prices, a 30-second product video that takes 10 minutes to generate is replacing a £500 video shoot.

Customer-facing chatbots

OUR PICK

Claude Haiku 4.5 with strict guardrails

Fast, cheap, and (critically) the most polite when refusing requests outside its scope. Customers can tell the difference between Claude's "I can't help with that, but here's who can" and GPT's "I cannot assist with that request".

Free alternative: Use Claude.ai's free tier behind a public chat link for low-volume use cases.

Conventional wisdom says: Use the cheapest model possible. Our take: don't be cheap on customer-facing chat. The tone difference between Haiku and free Llama 4 is the difference between "professional bot" and "customer support disaster".

Personal AI assistant for life admin

OUR PICK

Claude.ai with Projects

Set up a Project with your CV, your calendar, your goals, your email signature, your tone-of-voice doc. Then "draft an email apologising to my dentist for missing the appointment" gets written in your voice with no extra context.

Free alternative: ChatGPT with Custom Instructions or memory. Less sophisticated but works.

Conventional wisdom says: Use whichever model your team uses. Our take: keep your personal AI separate from your work AI. Different memory, different context, different conversations. Less risk of leaking work context into a personal email.

Want to learn this properly?

The full Smash Your AI course teaches you how to pick the right model and prompt it well. 56 lessons across 3 tiers, hands-on practice that AI grades for you, lifetime access. £49 once.

Browse the course

Module 1 is free. No signup needed to read it.

Last updated: 4 May 2026. We refresh this guide on the first Sunday of every month. Subscribe to our blog to get the changes in your inbox.