A week in Generative AI: Apple, Creativity & Shifting Sands
News for the week ending 15th June 2025
Lots of news to cover this week, which individually donāt represent any fundamental shifts, but when taken together do start to point to generative AI technologies starting to have a real world disruptive impact.
We had Appleās annual WWDC, the launch of a new creativity benchmark, OpenAI signing a deal with Google, Meta acquiring a 49% stake in Scale AI, the release of o3-pro from Open AI, and new reasoning models from Mistral.
On the Ethics front there were reports of publishers seeing reduced traffic because of Googleās AI Overviews, Metaās AI app being reported as a privacy disaster, and Anthropic shutting down their AI-generated blog after only a couple of weeks.
I also highly recommend the Long Reads from Anthropic on building multi-agent systems, Benedict Evanās post on AIās metrics question, and Latent.Spaceās post on o3 pro.
Appleās WWDC recap
Appleās WWDC was the big event of the week, and whilst not predominantly focused on AI, after last yearās launch of Apple Intelligence and the big (undelivered) promises around Siri itās worth a look at to see where things are heading in Cupertino.
Firstly, they opened the event by talking about all the things they had delivered in the last 12 months under the Apple Intelligence banner. Many of these things are either small features in existing apps, or quirky and not widely used apps like Image Playground and standalone features like Genmoji. Apple were very clear to state that they see AI/Apple Intelligence as an enabling technology that will crop up in multiple features across their entire software stack and that theyāre not interested in building a āchatbotā. However, things like Image Playground and Genmoji fly against this philosophy and because of this stick out like a sore thumb.
I do think that treating AI as an enabling technology is the right strategy for Apple, with the exception of Siri, which is feeling more dated and incapable by the week and it will be hard for Apple to catch-up if they arenāt able to deliver a new an improved Siri until 2026. Theyāll have to blow the doors off in 9 months time to be able to compete with the experiences now available from other AI platforms, let alone the experiences that will be commonplace in early 2026.
For those interested, below is some good coverage of the event and more news that came out of all the announcements:
Apple confirms Siriās delayed features wonāt ship until 2026
Apple brings Apple Intelligence to the iPhone screen at WWDC 2025
Apple Executives Defend Apple Intelligence, Siri and AI Strategy
Creativity Benchmark
One of the unique things about generative AI technologies is that they are creative. Whether you believe theyāre just remixing content theyāre been training on, or are genuinely capable of creating unique content it is the first technology that has been able to create. Weāre 2.5 years into having generative AI widely available and despite plenty of benchmarks across a wide range of capabilities, there have been very few benchmarks that test one of the core capabilities of generative AI models - creativity.
Thatās why I was so pleased to hear that Springboards.ai had been building a new benchmark specifically designed to test how creative different generative AI models are. Itās a very simple, and familiar approach - essentially its a head-to-head test of two models in answering a ācreativeā prompt that ranges from insights to ideas for different brands and products. As you choose the answers, the Creativity Benchmark platform builds up a view of which models creative output you prefer. The more choices you make, the more reliable your score.
My results surprised me! I started to get an intuitive feel for the responses that were generated by Large Reasoning Models rather than Large Language Models, so it was no surprise to see OpenAIās o3 at the bottom of my scores. I was however very surprised to see Grok 3 Beta and GPT 3.5 (?!) as my top two models after 100 selections. Itās a fun little test to do and doesnāt take too much time, so I highly encourage you to check it out and share your results!
Shifting Sands
There were a few other news items that caught my eye this week that I wanted to share. Theyāre not huge on their own, but all together in one week point to some shifting sands in the generative AI industry.
Firstly, OpenAI signed a deal with Google as they havenāt been able to get the compute capacity they need just from Microsoft anymore. So going forwards, OpenAI will be partly dependent on Google to build and serve their models. Iām surprised they opted for Google over AWS due to their fierce competition with Google DeepMind and this is an interesting change of strategy.
Secondly, Meta acquired a huge stake (49% for $14.3bn) in Scale AI which is the largest provider of human labelled data for AI model training. Their data supplies/powers nearly every major foundation model that exists and the Meta deal includes their CEO, Alexandr Wang, stepping down to join Meta. This has already led to Google apparently cutting ties with Scale AI and represents a huge push by Meta to catch up with their AI -rivals after the disappointing Llama 4 release a few months ago.
Thirdly, Googleās AI search features are killing traffic to publishers. Iāve been expecting this news to start hitting for a few months, as what weāre seeing is the start of large language models replacing website traffic. It just so happens that publishers are starting to see this effect first, and itās mostly being driven by Googleās AI Overviews which some publishers claim is theft. Weāre going to see a lot more of these headlines over the coming months/years as traffic to traditional websites declines as more people adopt and use large language models more regularly.
All of these three things point to the generative AI industry starting to have a disruptive impact - weāve got infrastructure changes (OpenAI/Google), industry consolidation (Meta/ScaleAI), and real world impact starting to be felt (Publishers). Itās taken a couple of years since ChatGPTās launch to get here and I think weāre only just starting to see the tip of the iceberg.
OpenAI releases o3-pro, a souped-up version of its o3 AI reasoning model
On Tuesday OpenAI released their latest, greatest large reasoning model - o3-pro. They also reduced the costs of o3 by 80% š¤Æ. OpenAI havenāt released many benchmarks for o3-pro but claim that āIn expert evaluations, reviewers consistently prefer o3-pro over o3 in every tested category and especially in key domains like science, education, programming, business, and writing help.ā
For most use cases, I think the fact that o3 is now 80% cheaper is probably the biggest game changer. It was only released in April, so has seen that 80% cost reduction in just 8 weeks. In isolation that would seem amazing, but this is the trend weāre seeing across the board with pricing rapidly coming down, making models cheaper to use and accessible to more people. Itās astounding how quickly capabilities are increasing at the same time as costs rapidly reducing.
Mistral releases a pair of AI reasoning models
Itās great to see Mistral releasing their own large reasoning models - Magistral comes in two flavour - small and medium. Medium is in preview via their LeChat platform whereas small is available for download under an Apache 2.0 licence.
Mistral are positioning Magistral as useful for a wide range of enterprise use and seem to be aiming at organisations that want to run a model on their own infrastructure and donāt want to use DeepSeekās models. Because the models are smaller than most large reasoning models, they run at 10x the speed (and use 10x less electricity).
AI Ethics News
London AI firm says Getty copyright case poses āovert threatā to industry
Disney and Universal sue Midjourney, alleging AI-related copyright infringement
AI can ālevel upā opportunities for dyslexic children, says UK tech secretary
Sam Altman claims an average ChatGPT query uses āroughly one fifteenth of a teaspoonā of water
Advanced AI suffers ācomplete accuracy collapseā in face of complex problems, study finds
Klarnaās CEO is now taking your calls ā over an AI hotline
Long Reads
Anthropic - How we built our multi-agent research system
Sam Altman - The Gentle Singularity
Benedict Evans - AIās metrics question
Stratechery - Apple Retreats
Latent.Space - God is hungry for Context: First thoughts on o3 pro
āThe future is already here, itās just not evenly distributed.ā
William Gibson