A week in Generative AI: Claude 3.7, GPT-4.5 & Alexa Plus
News for the week ending 2nd March 2025
As predicted, after a slower week last week weāve had a blockbuster this week with the launch of Anthropicās Claude 3.7 Sonnet, OpenAIās GPT-4.5, and the announcement of Amazonās Alexa Plus. I found Claude 3.7 Sonnet to be a great improvement, and GPT-4.5 to be a strange mixed bag. There was another great video from Figure AI of their Helix model in action, and a fantastic demo video of how AI models might talk to each other in the future.
In Ethics News, the big headlines have been around the pushback on proposed changes to copyright legislation in the UK and the delaying of AI regulation to align with the Trump administration.
In Long Reads, Ethan Mollick has a great thought piece on the new generation of AI models and Andrej Karpathy has released another brilliant long video on how he uses LLMs.
Anthropic release Claude 3.7 Sonnet
Anthropic released their latest frontier model on Monday, Claude 3.7 Sonnet. Iāve been testing it all week and I have to say Iām very impressed.
Claude has been my go to model for most things for the last year, but that was beginning to wane over the last month or so as ChatGPT updated their GPT-4o model to make it nicer to use and released o3-mini and Deep Research. I was also getting increasingly frustrated by Claudeās usage limits, even on the Pro tier, which meant I was having to use ChatGPT when I hit those limits and waited for them to reset.
However, Claude 3.7 Sonnet has swung me firmly back in favour of Anthropicās models, despite still being frustrated by the usage limits. So Iām finding myself using Claude for big tasks (coding, writing etc.) and ChatGPT for quick, small tasks. Iām also using ChatGPT for research as nothing comes close to OpenAIās Deep Research capability and Claude is still unable to browse online.
So what is Claude 3.7 Sonnet? Well, itās not a significantly bigger mode - so still very much in the same class as OpenAIās GPT-4 models. Itās more of a refinement of Anthropicās current generation of models rather than a generational leap in size and capability, hence the 3.7 name. Despite this, Iām still very impressed. Itās fast, it has a hugely increased output limit (from c.8k tokens to 128k tokens) which helps with coding and long writing tasks, and itās coding capabilities are unmatched in the industry right now.
3.7 Sonnet also brings more capable reasoning to the party. Despite 3.5 Sonnet not being a large reasoning model in the same way o1, o3-mini, and DeepSeek R1 are, it was paradoxically one of the best large language models at reasoning. 3.7 Sonnet builds on this and brings full reasoning capabilities, allowing you to extend the amount of āthinking timeā it has in a similar way to the other large reasoning models. This has allowed 3.7 Sonnet to be used for tackling more complex tasks and problems which is why Anthropic used PokĆ©mon as one of the benchmarks for it.
With these added reasoning capabilities, Claude 3.7 is the first large hybrid model to market, ahead of the release of OpenAIās GPT-5 which will combine a GPT-4.5 with their o3 reasoning model. These hybrid models are now the frontier of generative AI models and I expect weāll see more of them released from Google, Meta etc. over the coming months.
As Anthropic stated in their release post for 3.7 Sonnet, the model marks āan important step towards AI systems that can truly augment human capabilitiesā. From my testing and experience so far, Iād have to agree with them.
OpenAI unveils GPT-4.5 āOrionā, its largest AI model yet
I have to get this out of the way upfront - this is a very strange release from OpenAI. GPT-4.5 is a new model that is an order of magnitude larger than the GPT-4 class models theyāve released over the last couple of years, but they have refrained from calling it GPT-5. Iām guessing this is because the improvements that they were hoping to see from increasing the size of the model just havenāt materialised. In fact, in the original system card for GPT-4.5 OpenAI clearly stated that they didnāt consider it to be a frontier model:
āGPT-4.5 is not a frontier model, but it is OpenAIās largest LLM, improving on GPT-4ās computational efficiency by more than 10x. While GPT-4.5 demonstrates increased world knowledge, improved writing ability, and refined personality over previous models, it does not introduce net-new frontier capabilities compared to previous reasoning releases, and its performance is below that of o1, o3-mini, and deep research on most preparedness evaluations.ā
Since the release, OpenAI have actually updated the system card and the same section now reads as follows:
While GPT-4.5 demonstrates increased world knowledge, improved writing ability, and refined personality over previous models, and is our most capable GPT-series release, it does not introduce net-new capabilities on most preparedness evaluations compared to previous reasoning releases.ā
So, what is GPT-4.5? Well itās probably 10x the size, but probably only about 10% more capable across most benchmarks. In fact on a few benchmarks it doesnāt significantly improve vs. GPT-4o and actually goes backwards and is worse than o1 and o3-mini.
With the increase in size, comes some significant drawbacks - itās slow, expensive and will be using more energy, which is bad for the environment. Itās so expensive in fact, that only subscribers to OpenAIās Pro tier ($200 per month) can currently access it and the API costs are c.250x more expensive than GPT-4o mini.
As I canāt access it to test it myself, I have to rely on feedback from others and itās interesting⦠A good example is Andrej Karpathy (ex OpenAI founder) who praised the modelās vibe and EQ. He then shared 5 responses to the same prompt from GPT-4.5 and GPT-4 and in a blind test asked his followers to vote which their favourite response was. GPT-4 won 4 of the 5 comparisons, which he found surprising.
Wrapping this all up, GPT-4.5 as it currently stands is not a step forward in terms of capability despite being a ānext generationā model, which is why I suspect it was named GPT4.5 instead of GPT-5. I suspect OpenAI felt they had to release it as it was incredibly expensive to train, but the jury is definitely out on whether the continued scaling of models will to lead to significant increases in capability.
Having said that, GPT-4.5 will be a strong foundation for GPT-5ās integration with o3 and will power the next generation of large reasoning models like o4 and beyond. It just doesnāt add much value on its own, which is why I think its strange that OpenAI have released it in its current form.
Amazon launch Alexa Plus
Alexa has been around for over 10 years and for much of that has been held up as the best and most capable voice assistant/kitchen timer available. Then ChatGPT launched in November 2022 and completely reset consumerās expectations of what conversing with technology could be, immediately relegating Alexa, Siri, and Google Assistant to āold technologyā status.
Since then Amazon have been working hard on how to bring large language models to their voice assistant and Alexa Plus is a complete, bottom-up rebuilding of Alexa on top of a whole host of LLMs. This means that Alexa Plus is a more capable conversationalist and so no longer requires users to remember specific commands or device names, making it much more user friendly to both use and create smart home routines.
There are also new capabilities - Alexa Plus has a web interface which is better longer, more complex interactions and is better integrated into Amazonās burgeoning array of smart home devices. For example you can ask if anyone has taken the dog for a walk and Alexa Plus will be able to determine this from your Ring Doorbellās smart video search that shows summaries of events that have happened around the house.
Thereās no doubt that the vision for Alexa Plus is being a highly capable and what I would consider a level 4 AI agent, especially for home based tasks. However, it doesnāt launch for another month when it will have limited availability via an early access program in the US only, so weāll have to see if the vision matches with reality.
The good news is Alexa Plus will cost $19.99 per month, along the same lines as āproā tiers for other AI models and will be included in Prime membership. Thereās a good hands-on with Alexa Plus article from The Verge if youāre interested in more details.
Is GibberLink the future of AI communications?
I had to share this as I was really impressed and its a great demo video - two AIās on a voice call switching to a more efficient mode of communication once they realise thereās no human-in-the-loop.
This capability makes total sense - thereās no need for AI models to use words when it can be more efficient to communicate differently. It does however pose the question of how a human could oversee and understand the interaction if needed. I expect somewhere in our future will be a new ālanguageā for AIs that will emerge that weāll need a translation layer to understand, much like being able to read the code in the Matrix.
I think this encapsulates quite nicely whatās going on with AI at the moment - incredibly exciting but also scary in equal measure!
Figure shows off Helix Logistics
Following last weekās demo of Helix and two robots coordinating to perform tasks, Figure has released a new video showing their robots working on a logistics and packaging line. The robots pick packages off the line and position them so the barcodes are visible for scanning.
Itās great to see the robots operating in a real-world context and comes alongside reports that Figure will start āalpha testingā its humanoid robot in the home in 2025.
AI Ethics News
UK delays plans to regulate AI as ministers seek to align with Trump administration
1,000 artists release āsilentā album to protest UK copyright sell-out to AI
Sora, OpenAIās video generator, has hit the UK. Itās obvious why creatives are worried
Ocado to cut 500 technology and finance jobs as AI reduces costs
Web Summit attendees arenāt buying Scale AI CEOās push for America āto win the AI warā
Long Reads
One Useful Thing - A new generation of AIs: Claude 3.7 and Grok 3
Stratechery - AI Promise and Chip Precariousness
Simon Willison - Claude 3.7 Sonnet, extended thinking and long output
Andrej Karpathy - How I use LLMs
āThe future is already here, itās just not evenly distributed.ā
William Gibson