A week in Generative AI: Codex, AlphaEvolve & GPT-4.1
News for the week ending 18th May 2025
We’ve had a quieter week this week, but with Google’s I/O conference next week I’m expecting it to be a doozy! This week OpenAI launched Codex late on Friday, probably heading off some of the news from I/O next week. Google DeepMind released AlphaEvolve, an interesting specialised engineering model that aims to eliminate hallucinations, and OpenAI released GPT-4.1 onto ChatGPT.
There was lost of ethics news this week, with headlines ranging from Grok’s obsession with white genocide, the firing of the head of the US Copyright Office, and the House of Lords pushing back on the UK government’s AI plans.
There are some good Long Reads too - I recommend watching the interview with Sam Altman from Sequoia Capital’s AI Ascent and for those engineers out there, it’s worth checking out the story behind Building, Launching, and Scaling ChatGPT’s Images.
Enjoy!
OpenAI launch Codex, it’s latest coding agent
Despite being positioned as a ‘research preview’ I think this is a very big release from OpenAI. I can’t access it yet (as I don’t have a $200 pro subscription) but I am very much looking forward to getting my hands on it and putting it though its paces as coding is one of my main use cases for generative AI models right now.
Codex is a ‘cloud-based software engineering agent that can work on many tasks in parallel”. It’s that last bit “in parallel” that is a big deal. Up until now we’ve had agents that could go off and do tasks (such as Deep Research) but they were built to be semi-supervised. Moving to agents that can run in parallel suggests that they need less supervision and we’re getting closer to agents that are reliable enough that you can ‘set and forget’.
This is obviously only in the one, narrow domain of coding but it’s a big one. Sam Altman said in an interview at Sequoia Capital’s AI Ascent conference last week that coding is a ‘central category’ for OpenAI. It’s will bet one of their core services. I think this explains the importance of Codex to them, and the release of GPT-4.1 in ChatGPT this week (see below). OpenAI envision a future where ChatGPT is writing code to get things done in the real world, and Codex is a step towards that.
DeepMind claims its newest AI tool is a whiz at math and science problems
This is a great new announcement of a new model in Google DeepMind’s ‘Alpha’ series. AlphaEvolve is a model designed to work in fields like computer science and maths, and includes a self-evaluation mechanism to cut down on hallucinations.
The model is very narrowly focused on solving ‘numerical’ problems and can only provide answers as algorithms. However, this does allow for it to be applied to specific scientific and engineering problems and because hallucinations are limited, it can be reliable used autonomously. This doesn’t mean that it’s necessarily able to find novel solutions to problems, but does mean that it can save time while freeing up time for engineers to focus on other problems.
We’re starting to see the emergence of specific, specialised models designed for specific, specialised problems which is very different from the ‘general use’ nature of large language models so far. I think the promise is general use models with access to specialised models that are more reliable in specific domains. That’s probably the road forwards for continuing to improve capabilities over the long term.
OpenAI brings its GPT-4.1 models to ChatGPT
OpenAI originally released its first family of models, GPT-4.1 a month ago just in the API, and the release was squarely aimed at developers. This week they decided to make two of the models from the GPT-4.1 family available to users in ChatGPT - GPT-4.1 and GPT-4.1 mini.
As I wrote about when they were originally released I think the GPT-4.1 family is a distilled version of the incredibly expensive GPT-4.5, instead of being an improved version of GPT-4o.
The big difference with the GPT-4.1 family of models is that they’re better at coding, instruction following, and have a much larger context window of 1m tokens. I’m not sure how generally useful these improvements will be to causal users, but I think this is part of OpenAI’s current focus on coding as a big use case for their platform.
AI Ethics News
xAI blames Grok’s obsession with white genocide on an ‘unauthorized modification’
White House fires head of Copyright Office amid Library of Congress shakeup
It’s Breathtaking How Fast AI Is Screwing Up the Education System
Anthropic’s lawyer was forced to apologize after Claude hallucinated a legal citation
AI can spontaneously develop human-like communication, study finds
Fortnite players can speak with Darth Vader through a James Earl Jones-voiced AI
Audible unveils plans to use AI voices to narrate audiobooks
Long Reads
Sequoia Capital - Sam Altman: Building the ‘Core AI Subscription’ for Your Life
Pragmatic Engineer - Building, Launching, and Scaling ChatGPT Images
Simon Willison - Building Software on top of Large Language Models
“The future is already here, it’s just not evenly distributed.“
William Gibson