Updated monthly · May 2026 edition
We test the major frontier models every month and pick the one we'd actually reach for, by task. No fence-sitting. No "it depends" without saying what it depends on. If a model is overkill for the job we tell you to use the cheaper one, even when it costs us nothing to recommend the more expensive one.
How we choose
We use these models daily to run Smash Your AI, write our blog, build our course, automate our work, and consult for clients. The recommendations below come from real jobs we've shipped, not from benchmark tables. Where we disagree with the conventional wisdom, we say so and explain why.
For most professional writing
Claude Sonnet 4.6
Cleanest prose, lowest hallucination rate, best at matching a tone you've supplied.
For most code work
Claude Code (Opus 4.7)
By a margin. The agentic loop is the difference between "wrote a function" and "shipped a feature".
For most images
Midjourney v7
Sora and Imagen 3 are catching up but Midjourney still sets the look.
For everything cheap and fast
GPT-4o mini
Bulk classification, basic Q&A, embedded chatbots. Pennies per 1,000 calls.
20 common tasks. Our pick, why we picked it, what we'd swap to if we couldn't, and where the conventional wisdom is wrong.
Claude Sonnet 4.6
Best balance of voice control, hallucination rate, and the quality of its first draft. We've never found a model that follows a "write in this voice, with these examples" instruction more faithfully.
Free alternative: Claude.ai's free tier gives you Sonnet for a few messages a day. After that it falls back to Haiku, which is genuinely good for shorter pieces.
Claude Code with Opus 4.7
Not just the model — the whole agentic environment. Claude Code can read your repo, run your tests, debug its own work, and stop when something is genuinely ambiguous. Cursor with Sonnet is a close second if you want a more traditional IDE feel.
Free alternative: Cursor's free tier with GPT-4o is fine for one-file scripts. Anything that touches more than 3 files, you'll feel the difference.
ChatGPT Deep Research mode
For a job that needs 10+ sources synthesised with traceable citations, OpenAI's Deep Research is the only product where we trust the citations to actually match the claims. NotebookLM is brilliant for sources you've already curated, but won't go and find more.
Free alternative: Perplexity Pro free trial. Quality drops on the third or fourth iteration but the first run is decent.
Claude Haiku 4.5
Fast, cheap, and surprisingly good. We pipe legal docs, customer interview transcripts, and research papers through it daily. Sonnet is better but you usually don't need it for summarisation.
Free alternative: Claude.ai free tier has Haiku quota. Or paste into NotebookLM if it's a single doc you want to query repeatedly.
Midjourney v7
Still ahead on aesthetic quality, especially for product, lifestyle, and brand imagery. Imagen 3 is sharper for hyper-realism. ChatGPT's image gen is the easiest to use but the least controllable.
Free alternative: Bing Image Creator (DALL-E 3) for casual use. Quality is good for marketing visuals.
ElevenLabs v3
Best voice cloning, most natural emotional delivery, biggest voice library. We use it for course narration and podcast production.
Free alternative: ElevenLabs free tier (10,000 chars/month). Enough for a couple of short podcasts.
Whisper Large v3 (via Groq)
Whisper is the model. Groq is the host that makes it fast and cheap. £0.0003 per minute of audio with sub-second latency.
Free alternative: Whisper local (run on your Mac with whisper.cpp). Slower but private and free.
o3-pro or Claude Opus 4.7 with extended thinking
o3-pro for pure-maths and logic problems. Opus 4.7 with extended thinking for anything that mixes reasoning with judgment (e.g. "design this system"). The two models think differently.
Free alternative: ChatGPT free tier with o3-mini.
GPT-4o mini
£0.15 per million input tokens, £0.60 per million output. We process tens of thousands of customer emails through it for sentiment + intent tagging. Claude Haiku is comparable on quality but slightly more expensive.
Free alternative: Gemini Flash 2.0 (Google AI Studio free tier). Generous quota.
Claude Sonnet 4.6 vision
Best at extracting structured data from messy inputs (receipts, screenshots, hand-drawn diagrams). Gemini 2.5 Pro is comparable on charts and graphs specifically.
Free alternative: Gemini's free tier includes vision.
Claude Sonnet 4.6 with prompt caching
Long context (200k tokens), low hallucination rate, and prompt caching makes it ~10x cheaper than naive RAG when you query the same docs repeatedly.
Free alternative: NotebookLM. Best in class for non-developers and free.
Claude Sonnet 4.6 via the Anthropic SDK
Best tool-use accuracy, best at recognising when a job is done, best at asking for help vs guessing. The other models will run in circles.
Free alternative: None. If you want to build production agents you're paying.
Cursor with Sonnet 4.6
Best inline experience. Tab-to-accept feels native, the model knows when to suggest, and the per-file context is right by default.
Free alternative: Cursor's free tier has limited Sonnet calls. After that, GPT-4o autocomplete is fine.
Claude Sonnet 4.6 with examples in the prompt
Same model as long-form, different setup. Always paste 3-5 of your best past pieces as examples. The brief should be one paragraph, not a checklist.
Free alternative: Claude free tier. Or Mistral Large 3 if you want a non-US-hosted option.
DeepL for European languages, Gemini 2.5 Pro for everything else
DeepL still wins on European pairs (English↔French, German, Spanish, Italian, Dutch). For Mandarin, Japanese, Arabic, Hindi and the long tail, Gemini 2.5 Pro is now ahead of GPT-5 and Claude.
Free alternative: DeepL free tier is generous. Google Translate for the long tail if you don't want to use Gemini.
ChatGPT with Code Interpreter
Upload the spreadsheet, ask in plain English, get a chart or a clean CSV back. Claude can write the formula but ChatGPT will run it on your data and show you the result.
Free alternative: ChatGPT free tier includes Code Interpreter for limited use.
Claude Sonnet for content, Gamma for design
Use Claude to write the deck structure and slide-by-slide copy in markdown. Paste it into Gamma to auto-design. Better than any single tool's "make me a deck" feature.
Free alternative: Both have free tiers. Gamma's free tier limits exports; the design quality is full.
Sora 2
Best at coherent motion, lighting, and physics. Veo 3 from Google is comparable, sometimes better at human faces.
Free alternative: Runway Gen-3's free tier for short clips. Quality is a notch below Sora.
Claude Haiku 4.5 with strict guardrails
Fast, cheap, and (critically) the most polite when refusing requests outside its scope. Customers can tell the difference between Claude's "I can't help with that, but here's who can" and GPT's "I cannot assist with that request".
Free alternative: Use Claude.ai's free tier behind a public chat link for low-volume use cases.
Claude.ai with Projects
Set up a Project with your CV, your calendar, your goals, your email signature, your tone-of-voice doc. Then "draft an email apologising to my dentist for missing the appointment" gets written in your voice with no extra context.
Free alternative: ChatGPT with Custom Instructions or memory. Less sophisticated but works.
The full Smash Your AI course teaches you how to pick the right model and prompt it well. 56 lessons across 3 tiers, hands-on practice that AI grades for you, lifetime access. £49 once.
Browse the courseModule 1 is free. No signup needed to read it.
Last updated: 4 May 2026. We refresh this guide on the first Sunday of every month. Subscribe to our blog to get the changes in your inbox.