A week in Generative AI: Memories, Llama & Atlas in Advertising
News for the week ending 13th April 2025
Another relatively quiet week this week in GenAI with a few nice little features and models dropped, but nothing major that will move the frontier forward. OpenAI released an enhanced memory feature in the US for Pro subscribers, there was continued controversy around Metaās release of Llama 4 and gaming benchmarks, and Amazon released a state-of-the-art speech model called Nova Sonic. There was also a really interesting video from WPP, Boston Dynamics, and NVIDIA showcasing how the Atlas robot can be used in advertising - not something I thought Iād see in 2025!
ChatGPT will now remember your old conversations
I was very excited when OpenAI released their original memory feature in May last year as itās one of the main components needed for a more personalised experience with Generative AI models. That feature turned out to be a bit of a dud and I could never really get it to work. I think in the last year it only created a couple of (mostly irrelevant) memories for me unless I specifically prompted ChatGPT to do so.
This weekās update gives ChatGPT to recall old conversations youāve had with it, so it can now use āsaved memoriesā that users have manually asked it to remember as well as your whole conversation history. As is becoming usual, the feature is yet to be available in the EU, UK and a few other countries as there are more regulatory hurdles to overcome, so I havenāt been able to start testing the new feature myself. To start with the feature will only be available to Pro subscribers and will come to Plus subscribers āsoonā.
I think this additional memory feature is a step in the right direction, but Iām not sure its the final product as memories need to be automatically distilled from previous conversations and I doubt that just referencing old conversations will be optimal. However, I still have a lot of hope for memory features with Generative AI models and stick by everything I wrote last year - theyāll make models more useful, relevant, help them become more proactive, and also help AI companies navigate issues around bias by tailoring responses to individual user preferences.
We need to talk about Llama 4ā¦
TLDR - Meta have been caught gaming AI benchmarks with Llama 4. Not a good look, and on top of weirdly releasing on a Saturday before the main parent model has completed training, this has been a very strange release.
So, what happened is that Meta announced their Llama 4 models and everyone was surprised that the Maverick model quickly secured the number-two spot on LMArena. This was even sited in Metaās press release where they highlighted an ELO score of 1417. This achievement would have positioned Llama 4 Maverick as a state-of-the-art model, competing with GPT-4o, Gemini 2.5 Pro and Claude 3.7.
However, it turns out it wasnāt Llama 4 Maverick that got an ELO score of 1417, it was a version of Llama 4 Maverick that was an āexperimental chat versionā (i.e. fine-tuned to do well on LMArena). It turns out that the standard version of Llama 4 Maverick only scored around 1273, ranking it 35th not 2nd. You can see the scores for yourself here.
This controversey led to a post from LMArena where they explained that āMetaās interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that āLlama-4-Maverick-03-26-Experimentalā was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesnāt occur in the future.ā
Thereās also the weird issue of how Meta are āaddressing bias in LLMsā with the release of Llama 4. In their blog post on the release, Meta state that āLlama 4 is dramatically more balanced with prompts it refuses to respond toā and then in the very next sentence says āOur testing shows that Llama 4 responds with strong political lean at a rate comparable to Grokā. I find this a little confusing as being balanced and having a strong political lean contradict each other š¤·āāļø.
Sadly, I think Meta have undone a lot of goodwill and trust that theyāve built up with their Llama models (which have been excellent so far and the leading open models) with this latest release. I hope they take some learnings from this for future releases.
Amazon unveils a new AI voice model, Nova Sonic
The example in this video is very cringe, and Iāve never had any customer service experience anywhere near as good as the one portrayed, but its decent example of how advanced voice models like Amazonās Nova Sonic could be used in real-world customer interactions.
Nova Sonic ticks a lot of boxes for a state-of-the-art speech-to-speech model. It has expressive voices, natural turn taking, and has the ability to use tools which is essential when deploying in real-world customer-facing environments. Whilst the video mostly focuses on customer service, Amazon also list voice-enabled personal assistants and interactive education as other use cases. I expect weāll see this technology deployed by some large customer organisations before the year is out.
Atlas in Advertising
Regular readers will know that I have a soft spot for robotics and like to showcase new, interesting videos that show off the progress being made. I didnāt expect to be sharing a video of robotics being genuinely useful in an advertising context until at least next year, but here we are!
āLong, repeatable shotsā is the use case WPP have found for Boston Dynamicās Atlas robot, which isnāt something Iād thought of before and makes total sense. Iām not 100% convinced on some of the other use cases the video mentions, but at least thereās a nice clip of Atlas doing a backflip towards the end!
AI Ethics News
EU to build AI gigafactories in ā¬20bn push to catch up with US and China
Revealed: Big techās new datacentres will take water from the worldās driest areas
Energy demands from AI datacentres to quadruple by 2030, says report
Major publishers call on the US government to āStop AI Theftā
Law professors side with authors battling Meta in AI copyright case
Metaās benchmarks for its new AI models are a bit misleading
Meta exec denies the company artificially boosted Llama 4ās benchmark scores
Model Context Protocol has prompt injection security problems
Long Reads
Stanford University - The 2025 AI Index Report
Simon Willison - CaMeL offers a promising new direction for mitigating prompt injection attacks
āThe future is already here, itās just not evenly distributed.ā
William Gibson




