A week in Generative AI: Apple, Creativity & Shifting Sands
News for the week ending 15th June 2025
Lots of news to cover this week, which individually don’t represent any fundamental shifts, but when taken together do start to point to generative AI technologies starting to have a real world disruptive impact.
We had Apple’s annual WWDC, the launch of a new creativity benchmark, OpenAI signing a deal with Google, Meta acquiring a 49% stake in Scale AI, the release of o3-pro from Open AI, and new reasoning models from Mistral.
On the Ethics front there were reports of publishers seeing reduced traffic because of Google’s AI Overviews, Meta’s AI app being reported as a privacy disaster, and Anthropic shutting down their AI-generated blog after only a couple of weeks.
I also highly recommend the Long Reads from Anthropic on building multi-agent systems, Benedict Evan’s post on AI’s metrics question, and Latent.Space’s post on o3 pro.
Apple’s WWDC recap
Apple’s WWDC was the big event of the week, and whilst not predominantly focused on AI, after last year’s launch of Apple Intelligence and the big (undelivered) promises around Siri it’s worth a look at to see where things are heading in Cupertino.
Firstly, they opened the event by talking about all the things they had delivered in the last 12 months under the Apple Intelligence banner. Many of these things are either small features in existing apps, or quirky and not widely used apps like Image Playground and standalone features like Genmoji. Apple were very clear to state that they see AI/Apple Intelligence as an enabling technology that will crop up in multiple features across their entire software stack and that they’re not interested in building a ‘chatbot’. However, things like Image Playground and Genmoji fly against this philosophy and because of this stick out like a sore thumb.
I do think that treating AI as an enabling technology is the right strategy for Apple, with the exception of Siri, which is feeling more dated and incapable by the week and it will be hard for Apple to catch-up if they aren’t able to deliver a new an improved Siri until 2026. They’ll have to blow the doors off in 9 months time to be able to compete with the experiences now available from other AI platforms, let alone the experiences that will be commonplace in early 2026.
For those interested, below is some good coverage of the event and more news that came out of all the announcements:
Apple confirms Siri’s delayed features won’t ship until 2026
Apple brings Apple Intelligence to the iPhone screen at WWDC 2025
Apple Executives Defend Apple Intelligence, Siri and AI Strategy
Creativity Benchmark
One of the unique things about generative AI technologies is that they are creative. Whether you believe they’re just remixing content they’re been training on, or are genuinely capable of creating unique content it is the first technology that has been able to create. We’re 2.5 years into having generative AI widely available and despite plenty of benchmarks across a wide range of capabilities, there have been very few benchmarks that test one of the core capabilities of generative AI models - creativity.
That’s why I was so pleased to hear that Springboards.ai had been building a new benchmark specifically designed to test how creative different generative AI models are. It’s a very simple, and familiar approach - essentially its a head-to-head test of two models in answering a ‘creative’ prompt that ranges from insights to ideas for different brands and products. As you choose the answers, the Creativity Benchmark platform builds up a view of which models creative output you prefer. The more choices you make, the more reliable your score.
My results surprised me! I started to get an intuitive feel for the responses that were generated by Large Reasoning Models rather than Large Language Models, so it was no surprise to see OpenAI’s o3 at the bottom of my scores. I was however very surprised to see Grok 3 Beta and GPT 3.5 (?!) as my top two models after 100 selections. It’s a fun little test to do and doesn’t take too much time, so I highly encourage you to check it out and share your results!
Shifting Sands
There were a few other news items that caught my eye this week that I wanted to share. They’re not huge on their own, but all together in one week point to some shifting sands in the generative AI industry.
Firstly, OpenAI signed a deal with Google as they haven’t been able to get the compute capacity they need just from Microsoft anymore. So going forwards, OpenAI will be partly dependent on Google to build and serve their models. I’m surprised they opted for Google over AWS due to their fierce competition with Google DeepMind and this is an interesting change of strategy.
Secondly, Meta acquired a huge stake (49% for $14.3bn) in Scale AI which is the largest provider of human labelled data for AI model training. Their data supplies/powers nearly every major foundation model that exists and the Meta deal includes their CEO, Alexandr Wang, stepping down to join Meta. This has already led to Google apparently cutting ties with Scale AI and represents a huge push by Meta to catch up with their AI -rivals after the disappointing Llama 4 release a few months ago.
Thirdly, Google’s AI search features are killing traffic to publishers. I’ve been expecting this news to start hitting for a few months, as what we’re seeing is the start of large language models replacing website traffic. It just so happens that publishers are starting to see this effect first, and it’s mostly being driven by Google’s AI Overviews which some publishers claim is theft. We’re going to see a lot more of these headlines over the coming months/years as traffic to traditional websites declines as more people adopt and use large language models more regularly.
All of these three things point to the generative AI industry starting to have a disruptive impact - we’ve got infrastructure changes (OpenAI/Google), industry consolidation (Meta/ScaleAI), and real world impact starting to be felt (Publishers). It’s taken a couple of years since ChatGPT’s launch to get here and I think we’re only just starting to see the tip of the iceberg.
OpenAI releases o3-pro, a souped-up version of its o3 AI reasoning model
On Tuesday OpenAI released their latest, greatest large reasoning model - o3-pro. They also reduced the costs of o3 by 80% 🤯. OpenAI haven’t released many benchmarks for o3-pro but claim that “In expert evaluations, reviewers consistently prefer o3-pro over o3 in every tested category and especially in key domains like science, education, programming, business, and writing help.“
For most use cases, I think the fact that o3 is now 80% cheaper is probably the biggest game changer. It was only released in April, so has seen that 80% cost reduction in just 8 weeks. In isolation that would seem amazing, but this is the trend we’re seeing across the board with pricing rapidly coming down, making models cheaper to use and accessible to more people. It’s astounding how quickly capabilities are increasing at the same time as costs rapidly reducing.
Mistral releases a pair of AI reasoning models
It’s great to see Mistral releasing their own large reasoning models - Magistral comes in two flavour - small and medium. Medium is in preview via their LeChat platform whereas small is available for download under an Apache 2.0 licence.
Mistral are positioning Magistral as useful for a wide range of enterprise use and seem to be aiming at organisations that want to run a model on their own infrastructure and don’t want to use DeepSeek’s models. Because the models are smaller than most large reasoning models, they run at 10x the speed (and use 10x less electricity).
AI Ethics News
London AI firm says Getty copyright case poses ‘overt threat’ to industry
Disney and Universal sue Midjourney, alleging AI-related copyright infringement
AI can ‘level up’ opportunities for dyslexic children, says UK tech secretary
Sam Altman claims an average ChatGPT query uses ‘roughly one fifteenth of a teaspoon’ of water
Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds
Long Reads
Anthropic - How we built our multi-agent research system
Sam Altman - The Gentle Singularity
Benedict Evans - AI’s metrics question
Stratechery - Apple Retreats
Latent.Space - God is hungry for Context: First thoughts on o3 pro
“The future is already here, it’s just not evenly distributed.“
William Gibson