A week in Generative AI: Memories, Llama & Atlas in Advertising
News for the week ending 13th April 2025
Another relatively quiet week this week in GenAI with a few nice little features and models dropped, but nothing major that will move the frontier forward. OpenAI released an enhanced memory feature in the US for Pro subscribers, there was continued controversy around Meta’s release of Llama 4 and gaming benchmarks, and Amazon released a state-of-the-art speech model called Nova Sonic. There was also a really interesting video from WPP, Boston Dynamics, and NVIDIA showcasing how the Atlas robot can be used in advertising - not something I thought I’d see in 2025!
ChatGPT will now remember your old conversations
I was very excited when OpenAI released their original memory feature in May last year as it’s one of the main components needed for a more personalised experience with Generative AI models. That feature turned out to be a bit of a dud and I could never really get it to work. I think in the last year it only created a couple of (mostly irrelevant) memories for me unless I specifically prompted ChatGPT to do so.
This week’s update gives ChatGPT to recall old conversations you’ve had with it, so it can now use ‘saved memories’ that users have manually asked it to remember as well as your whole conversation history. As is becoming usual, the feature is yet to be available in the EU, UK and a few other countries as there are more regulatory hurdles to overcome, so I haven’t been able to start testing the new feature myself. To start with the feature will only be available to Pro subscribers and will come to Plus subscribers “soon”.
I think this additional memory feature is a step in the right direction, but I’m not sure its the final product as memories need to be automatically distilled from previous conversations and I doubt that just referencing old conversations will be optimal. However, I still have a lot of hope for memory features with Generative AI models and stick by everything I wrote last year - they’ll make models more useful, relevant, help them become more proactive, and also help AI companies navigate issues around bias by tailoring responses to individual user preferences.
We need to talk about Llama 4…
TLDR - Meta have been caught gaming AI benchmarks with Llama 4. Not a good look, and on top of weirdly releasing on a Saturday before the main parent model has completed training, this has been a very strange release.
So, what happened is that Meta announced their Llama 4 models and everyone was surprised that the Maverick model quickly secured the number-two spot on LMArena. This was even sited in Meta’s press release where they highlighted an ELO score of 1417. This achievement would have positioned Llama 4 Maverick as a state-of-the-art model, competing with GPT-4o, Gemini 2.5 Pro and Claude 3.7.
However, it turns out it wasn’t Llama 4 Maverick that got an ELO score of 1417, it was a version of Llama 4 Maverick that was an “experimental chat version“ (i.e. fine-tuned to do well on LMArena). It turns out that the standard version of Llama 4 Maverick only scored around 1273, ranking it 35th not 2nd. You can see the scores for yourself here.
This controversey led to a post from LMArena where they explained that “Meta’s interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that “Llama-4-Maverick-03-26-Experimental” was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.“
There’s also the weird issue of how Meta are ‘addressing bias in LLMs’ with the release of Llama 4. In their blog post on the release, Meta state that ‘Llama 4 is dramatically more balanced with prompts it refuses to respond to’ and then in the very next sentence says ‘Our testing shows that Llama 4 responds with strong political lean at a rate comparable to Grok’. I find this a little confusing as being balanced and having a strong political lean contradict each other 🤷♂️.
Sadly, I think Meta have undone a lot of goodwill and trust that they’ve built up with their Llama models (which have been excellent so far and the leading open models) with this latest release. I hope they take some learnings from this for future releases.
Amazon unveils a new AI voice model, Nova Sonic
The example in this video is very cringe, and I’ve never had any customer service experience anywhere near as good as the one portrayed, but its decent example of how advanced voice models like Amazon’s Nova Sonic could be used in real-world customer interactions.
Nova Sonic ticks a lot of boxes for a state-of-the-art speech-to-speech model. It has expressive voices, natural turn taking, and has the ability to use tools which is essential when deploying in real-world customer-facing environments. Whilst the video mostly focuses on customer service, Amazon also list voice-enabled personal assistants and interactive education as other use cases. I expect we’ll see this technology deployed by some large customer organisations before the year is out.
Atlas in Advertising
Regular readers will know that I have a soft spot for robotics and like to showcase new, interesting videos that show off the progress being made. I didn’t expect to be sharing a video of robotics being genuinely useful in an advertising context until at least next year, but here we are!
‘Long, repeatable shots’ is the use case WPP have found for Boston Dynamic’s Atlas robot, which isn’t something I’d thought of before and makes total sense. I’m not 100% convinced on some of the other use cases the video mentions, but at least there’s a nice clip of Atlas doing a backflip towards the end!
AI Ethics News
EU to build AI gigafactories in €20bn push to catch up with US and China
Revealed: Big tech’s new datacentres will take water from the world’s driest areas
Energy demands from AI datacentres to quadruple by 2030, says report
Major publishers call on the US government to ‘Stop AI Theft’
Law professors side with authors battling Meta in AI copyright case
Meta’s benchmarks for its new AI models are a bit misleading
Meta exec denies the company artificially boosted Llama 4’s benchmark scores
Model Context Protocol has prompt injection security problems
Long Reads
Stanford University - The 2025 AI Index Report
Simon Willison - CaMeL offers a promising new direction for mitigating prompt injection attacks
“The future is already here, it’s just not evenly distributed.“
William Gibson