A week in Generative AI: o1, Apple Intelligence & tying shoes
News for the week ending 15th September 2024
The big news of the week was OpenAI releasing o1, aka Strawberry, aka Q*. It’s a big deal, but probably not for the reasons many people think. We also had the Apple iPhone event where Apple Intelligence played a big role and I don’t think it’s a coincidence that these two things were announced in the same week. In more creative news, Adobe announced their new Firefly Video model, and some interested details emerged about MidJourney v7.
In ethics news, James Earl Jones (a great actor and voice of Darth Vader) passed away, allowing his voice to be used to train AI so that Darth Vader’s voice can live on. We also saw reports of the large frontier AI companies meeting with the White House to discuss AI energy and data centres.
I also couldn’t resist including a video of a Google DeepMind robot tying shoelaces, because why not?!
OpenAI launches o1 - a model that thinks
In a slightly surprising move (but not so surprising in hindsight) OpenAI released a preview of their next model - o1. To be clear this isn’t their next frontier model, GPT-5 (or whatever it ends up being called), but a new type of large language model that has the ability to ‘think’ before answering.
Up until now, LLMs have treated every query in the same way. No matter how simple or complex the query, LLMs answer immediately, and effectively use the same amount of compute. o1 changes this, and for the first time we have a model that will take longer (and ‘think’ more) to answer more complex queries.
What this means in practice is that o1 analyses the user’s query, creates a plan to answer it, and executes that plan. It answers the users query and shares it’s ‘thinking’ process. There’s a lot to get in to here, and I’ll follow up with a more in depth post soon, but o1 is a new type of generative AI technology that I think will lead to another step change and acceleration in progress towards artificial intelligence in its truest sense.
At this point in time, the o1 preview is very ‘raw’. It doesn’t have any of the user friendly features of GPT-4o (such as file attachments, ability to browse etc.). It’s really a technology showcase to demonstrate how this new ‘thinking’ approach can solve more complex queries and show a significant step change in the capabilities of frontier models.
Many commentators (and OpenAI themselves) are very focused on this capabilities side of things, but I’m more interested in how o1 demonstrates the ability to reason and plan through problems. This is the big advancement that will form the foundations of digital companions, enabling our current generation of chatbots to take actions on behalf of users.
This is why, with hindsight, I don’t think it’s that surprising that OpenAI announced o1 the same week as Apple’s iPhone event. o1 will be the foundation for Siri becoming more of a true assistant in the way that Apple showed off at WWDC this summer and I think we’ll see more news on this front in the coming months.
Exciting times ahead as we watch this unfold!
Apple demonstrated Apple Intelligence on iPhone
Despite Monday being an iPhone launch event (you can find highlights here), the real focus was on Apple Intelligence, which permeated the entire 90 minute event. New Apple Intelligence features won’t be available next week with the launch of iOS 18, with some coming next month in iOS 18.1, others coming ‘later this year’ and others coming ‘early next year’.
However, when the new Apple Intelligence features do arrive they promise a better Siri, visual search, and writing tools, amongst many other things. The real power of Apple Intelligence, as with the majority of Apple’s products, will come from what third party developers do with it.
As developers integrate Apple Intelligence into their apps, Siri will be able to perform tasks in these apps, access any text in the app and also be able to reference and understand the context of what users are doing in the apps. This will not only greatly increase Siri’s capabilities on iPhone, but also move it more towards being a digital companion.
Oprah had an AI special with Sam Altman and Bill Gates
In the US on Thursday, Oprah aired a special on AI called “AI and the Future of Us“ with Sam Altman, Bill Gates and other guests.
TLDR is the AI genie is out of the bottle, we will need to adapt, deepfakes are here to stay and only getting harder to spot, and there’s a lot of hope with the positive disruption AI will bring to fields like education and medicine.
It sounds like there were some really interesting conversation in the show, but I haven’t been able to find a way to get access to it from the UK yet. Would love to watch it, so if anyone knows if it will be made available outside of the US, please let me know in the comments below!
Adobe announces Firefly video model
Despite still not being released, I think OpenAI’s Sora is still the gold standard for text-to-video models. Whilst we wait to see if it will be released to the public, there have been many other video models released this year with convincing real-life quality and I think it’s inevitable that we see AI video creation become as pervasive as we’re seeing AI image creation over the next couple of years.
Adobe’s Firefly is an enterprise ready, production grade generative AI model that is free from copyright issues as it’s been trained on Adobe’s own stock content. It’s great to see video capabilities come to Firefly and I’m sure it will be as disruptive to the creative industries as the image generating capabilities that they launched in March 2023.
I’m looking forward to seeing what people can do with it!
Midjourney teases Version 7, 3D system, and external image editor
The next big update from Midjourney seems to be on the horizon, with v7 rumoured to be released before the end of the year. v7 will allow more personalisation of the images generated and learn more about user preferences over time. They’re also looking to introduce faster generation, increase the number of images generated, and add more editing features which will make Midjourney v7 much more suitable for professional use cases, although without the safety around copyright that Adobe’s Firefly brings.
Google DeepMind teaches a robot to autonomously tie its shoes and fix fellow robots
I love a good robotics demo, and this one doesn’t disappoint with some impressive improvements to dexterity and co-ordination between two robotic arms. To coin a term from Ethan Mollick, I think we’re seeing a ‘Jagged Frontier’ with our progress in robotics. Some demos look unbelievably good and others make you realise how far we’ve still got to go. Fun to watch nonetheless!
AI Ethics News
James Earl Jones Signed Over Rights For AI To Recreate Darth Vader’s Voice
Craig Federighi talks about the challenges behind keeping Apple Intelligence private
Hacker tricks ChatGPT into giving out detailed instructions for making homemade bombs
Nvidia, OpenAI, Anthropic and Google execs meet with White House to talk AI energy and data centers
Audible recruits voice actors to train audiobook-generating AI
Long Reads
One Useful Thing - Something New: On OpenAI's "Strawberry" and Reasoning
Anthropic - AI prompt engineering: A deep dive
“The future is already here, it’s just not evenly distributed.“
William Gibson