A week in Generative AI: Superalignment, GPT-4o & Google I/O
News for the week ending 19th May 2024
Wow, what a week in AI!
On Monday we had the announcement of GPT-4o, On Tuesday it was Google I/O and Ilya Sutskever resigned from OpenAI, on Wednesday Anthropic hired the co-founder of Instagram as their new Chief Product Officer, on Thursday OpenAI announced improvements to data analysis in ChatGPT and a partnership with Reddit and on Friday OpenAI reportedly dissolved Its Existential AI Risk Team.
There’s far too much news for me to be able to comment on everything this week, without making the newsletter too long, so I’ve focused on the news that I think is most significant.
Oh, and robot hands. I had to include robot hands 🫲🤖🫱.
There was also a lot of interesting AI ethics news, so make sure to check out that section too. We have the AI summit in Seoul next week, which is the follow up to the UK summit last year. I’ll have more news and coverage from that in my next newsletter.
OpenAI loses the co-leads of its superalignment team as Ilya Sutskever and Jan Leike walk away
Almost 6 months to the day from when Ilya Sutskever was at the centre of Sam Altman’s removal and re-instatement as OpenAI’s CEO, he announced that he was leaving OpenAI. This was quickly followed by a tweet from Ilya’s #2, Jan Leike who just said “I resigned“. He then followed up with more details about why on Friday in a series of tweets.
In these tweets, Jan Leike said that “OpenAI is putting ‘shiny products’ above safety“, which is what was rumoured to be at the centre of Sam Altman’s removal as CEO back in November.
With the co-leaders of OpenAI, Superalignment team having now left, this leaves some big question marks over the future of OpenAI’s efforts in this space. It was only in July last year that this team was announced and promised 20% of all of OpenAI’s compute. This obviously didn’t happen and one of the main reasons cited by Jan Leike for leaving.
Sam Altman acknowledged on Twitter that there is a lot more for OpenAI to do around alignment research and safety, and he’s going to publish a longer post on this in the next couple of days. It will be interesting to see what he says.
GPT-4o
GPT-4o was a big announcement from OpenAI, which I wrote about extensively on Monday. I then followed up with a few more thoughts on Tuesday and since then we’ve seen some more impressive features, such as the image above which was shared by Greg Brockman. The image was generated by GPT-4o and it seems there are many more capabilities to be explored with GPT-4o as they are released over the coming months.
As more is shared of GPT-4o and we get access to some of the features announced I’m more and more impressed by it. Many of these features are enabled by the fact that it is a multimodal model and makes me wonder why OpenAI didn’t call this release GPT-4.5 or even GPT-5. I suspect we have some even more amazing things coming our way when GPT-5 is released later this year!
Google I/O 2024
The day after GPT-4o was announced, Google had their I/O 2024 conference and in the 2 hour keynote presentation they made a huge number of AI-related announcements (120+ mentions!). You can find a good summary here, but below are my highlights:
Project Astra - This is arguably the headline product/feature of all the Gemini announcements at Google I/O. Project Astra is a “research preview” but shows an impressive multimodal AI assistant that can watch, understand and remember what it sees through your device’s camera. Google are positioning Project Astra as “A universal AI agent that is helpful in everyday life“. The demo videos are very impressive and well worth a watch.
Gemini Live - This is Google’s answer to GPT-4o’s voice capabilities, but doesn’t sound quite as advanced. “In the coming months” a new mobile experience will be rolling out that allow users to have a more intuitive conversation with Gemini, choose from a variety of natural-sounding voices, speak at their own pace and be able to interrupt Gemini.
Gemini 1.5 Flash - This is a smaller Gemini model that’s optimised for speed and reduces the model’s response time. This is the model that will undoubtedly be powering Gemini Live and is currently available for public preview for Google AI Studio.
AI Overviews - Previously known as “Search Generative Experience“, AI Overviews will populate search results pages with summarised answers from the web, similar to other AI search tools from Perplexity and Arc Search. AI Overviews will be rolling out to everyone in the US this week.
Search by video - This is a new feature in Lens that allows you to search by shooting a video, so you can now use images, audio, or video to start a search. If you search with video you still get a typical set of Google search results, but the point is to get there faster and to make it easier to tell Google what you’re looking for.
Gemini on Chrome - Google are adding Gemini Nano, it’s smaller Gemini model, to Chrome on desktop. This will help users generate text for things like social posts and product reviews, similar to how Microsoft added Copilot to Edge last year.
Ask Photos - This is essentially enhanced search in Google Photos, allowing you to search for photos using more detailed, natural language questions, similar to features Apple released in iOS 17. Ask Photos will be available in the “coming months”.
Google Veo - This is Google’s answer to OpenAI’s Sora model and they’ve bought Donald Glover on the ride with them! Leo generated high-quality, 1080p resolution videos than can go beyond a minute in a variety of styles. It will be released to select creators over the coming weeks and people can join the waitlist for access.
Gems - These are Google’s answer to OpenAI’s GPTs. You’ll be able to create customised Gems that each have their own interface and use a specific set of instructions that the user has created.
Many of the AI announcements that Google made at I/O are in response to many of the things that OpenAI have announced this year, or competitors are already offering in the search space. These features are all “coming soon'“ or “research previews” so it will be some time before people get their hands on all this and can really test them out.
Overall it feels like Google announced a lot of great things at I/O this week, but none of them are really new ideas. The most powerful and progressive AI work that Google are currently doing is in medicine, and it doesn’t feel like they’ve quite got into their consumer product groove yet.
DeepMind is experimenting with a nearly indestructible robot hand
This is some really impressive research - the hand is extremely flexible, fast, strong and durable. It can go from fully open to closed in 500 milliseconds, can exert 10 newtons of force and can withstand repeated punishment without sustaining fatal damage. Each finger has 100s of sensors on the fingertips as well as the fingers, which will provide AI models with huge volumes of new data to learn from.
Replicating human hands is incredibly complex - they need to be flexible but strong, be able to be very delicate but also exert strong force, and need to have a wide range of movement as well as be able to perform simple pinching actions. It’s great to see the progress we’re making in this field.
Behind the scenes of making “Deflated” with OpenAI’s Sora
This is just a little follow up to last week’s newsletter when I shared “Deflated” which was the follow up to shy kids’ short film “air head“, made with OpenAI’s Sora. It’s great to see how shy kids are combining outputs from Sora with filmed footage and editing techniques. This gives a great insight into how something like Sora will be used in the real world.
AI Ethics News
AI may cause job losses and rise in carbon emissions, report finds
AI GPU bottleneck has eased, but now power will constrain AI growth warns Zuckerberg
OpenAI thinks it knows what media is "high quality," and that's a problem
Why it is so dangerous for AI to learn how to lie: ‘It will deceive us like the rich’
The SF Bay Area Has Become The Undisputed Leader In AI Tech And Funding Dollars
Long Reads
One Useful Thing - What OpenAI did
TED - Fei-Fei Li: With spatial intelligence, AI will understand the real world
Nature - How does ChatGPT ‘think’? Psychology and neuroscience crack open AI large language models
Quanta Magazine - Game Theory Can Make AI More Correct and Efficient
“The future is already here, it’s just not evenly distributed.“
William Gibson