A week in Generative AI: Gemini, Ghibli & Tracing Thoughts
News for the week ending 30th March 2025
It’s been a big week with a new model from Google DeepMind that pushes the frontier further forwards, a new image generating capability from OpenAI that pushes the frontier further forwards, and new research from Anthropic on what’s going on under the hood of these models that pushes our understanding further forwards.
In ethics news, there are obviously concerns about OpenAI’s Studio Ghibli moment this week, some interesting research on the impact ChatGPT has on emotional wellbeing, and a report on how developers are fighting AI crawlers.
Enjoy!
Google DeepMind launch Gemini 2.5 Pro
This is a big update from Google DeepMind - their new Gemini 2.5 Pro model now tops the Chatbot Arena , and by a comfortable margin of 40 points too:
Gemini 2.5 Pro is also much cheaper than its nearest competitors, so this is a new frontier of intelligence/cost. Alongside the launch is the usual array of impressive benchmark scores across reasoning, coding, maths, and science, which all show that Gemini 2.5 Pro is state of the art.
Gemini 2.5 Pro is an incredibly impressive model, and Simon Willison has put Gemini 2.5 Pro it through its paces across a variety of tests. The model also comes with all the hallmarks of a DeepMind model - it has native multimodality, a long context window of 1m tokens (2m coming soon 🤯) and has a knowledge cut-off of just January 2025.
The most unique thing with Gemini 2.5 Pro is the context window. Not only is it 5x their rivals (both OpenAI and Anthropic top out at 200k) but it can pick out small details from that incredibly large amount of text better than any of the other models. Oh, and YouTube videos - it’s great at analysing YouTube videos, but you’d kind of expect that!
OpenAI upgrades image generation
Despite Gemini 2.5 Pro being the new kid on the block, the buzz this week has definitely gone to OpenAI’s new Image Generation capabilities in GPT-4o. There is no doubt that it is the new state-of-the-art image generation platform.
Most people have been creating Studio Ghibli inspired images (I couldn’t help myself either), but there’s a great thread of other things that are seriously impressive here. GPT-4o can now reliably recreate an image in many different styles, extract specific parts of an image, create anatomical drawings, draw voxel art, swap faces on people, mock up user interfaces, generate maps, draw comic books… the list goes on and on!
It’s capable of producing completely photo realistic images as well. They’re honestly indistinguishable from photographs, perfectly rendering people and also text. Text and hands have been hard for image models to crack, but GPT-4o has nailed them now:
One of the more controversial features of the new image generation model is the loosening of the controls around what people can generate. There’s a great post from Joanne Jeng, head of product and model behaviour at OpenAI, about it. I think it’s good to test the boundaries on this, but there are also lots of ethical and copyright issues to be mindful of.
Tracing the thoughts of an LLM
In May last year, Anthropic were the first frontier AI company to publish new research on how generative AI models work internally. They called it interpretability and had a great example of dialling up the prominence of the concept of the Golden Gate Bridge in a model so that every time it produced text on any topic it somehow managed to get the Golden Gate Bridge into it!
This new video and research builds on this. Not only have they made the work much more accessible to everyone in terms of language and explanation, but they’ve moved on their research techniques considerably in the last 12 months.
Essentially, anthropic are trying to answer questions such as:
What language, if any, does Claude use when thinking?
Does Claude only focus on predicting the next word, or does it plan ahead and have a destination in mind?
When Claude writes out its reasoning, does that reflect the actual reasoning the model has done?
Anthropic made some interesting discoveries in their research:
Claude sometimes thinks conceptually in thoughts that are shared across all languages, suggesting it does have a universal “language of thought“.
Claude does plan ahead and writes to get to that destination, so isn’t just predicting the next word in a sentence.
Claude does fake its reasoning sometimes, giving a plausible sounding argument that doesn’t reflect how it’s actually thinking.
There are lots more juicy learnings in their post about this research if you’re interested to learn more about how large language models actually work under the hood.
OpenAI adopts rival Anthropic’s standard for connecting AI models to data
Anthropic launched Model Context Protocol (MCP) as an open source standard for connecting AI assistants to different systems back in November last year. It started off slowly, but has gained a huge amount of momentum over the last couple of months. It’s a great system, has a huge community around it, and has an enormous number of connectors built for it now. It’s very exciting to see OpenAI choose to adopt it as well - this gives MCP a big chance of becoming the industry standard for how AI assistants access other platforms, something that’s fundamental to their usability, reliability, and all round usefulness.
This is a big, foundational building block being put in place for AI agents to become genuinely useful to most people. I think we’ll see a lot more from this later in the year.
More natural walking from Figure AI
More natural walking, yes, but still a bit slow and clunky. Great video though, and good to see progress!
AI Ethics News
OpenAI’s viral Studio Ghibli moment highlights AI copyright concerns
AI Experts Say We’re on the Wrong Path to Achieving Human-Like AI
Richard Osman urges writers to ‘have a good go’ at Meta over breaches of copyright
Early methods for studying affective use and emotional well-being on ChatGPT
Open source devs are fighting AI crawlers with cleverness and vengeance
Character.ai can now tell parents which bots their kid is talking to
Long Reads
One Useful Thing: No elephants: Breakthroughs in image generation
MKBHD - Apple’s AI Crisis: Explained!
Wired - Inside Google’s Two-Year Frenzy to Catch Up With OpenAI
“The future is already here, it’s just not evenly distributed.“
William Gibson