A week in Generative AI: Superalignment, GPT-4o & Google I/O
News for the week ending 19th May 2024
Wow, what a week in AI!
On Monday we had the announcement of GPT-4o, On Tuesday it was Google I/O and Ilya Sutskever resigned from OpenAI, on Wednesday Anthropic hired the co-founder of Instagram as their new Chief Product Officer, on Thursday OpenAI announced improvements to data analysis in ChatGPT and a partnership with Reddit and on Friday OpenAI reportedly dissolved Its Existential AI Risk Team.
Thereâs far too much news for me to be able to comment on everything this week, without making the newsletter too long, so Iâve focused on the news that I think is most significant.
Oh, and robot hands. I had to include robot hands đ«Čđ€đ«±.
There was also a lot of interesting AI ethics news, so make sure to check out that section too. We have the AI summit in Seoul next week, which is the follow up to the UK summit last year. Iâll have more news and coverage from that in my next newsletter.
OpenAI loses the co-leads of its superalignment team as Ilya Sutskever and Jan Leike walk away
Almost 6 months to the day from when Ilya Sutskever was at the centre of Sam Altmanâs removal and re-instatement as OpenAIâs CEO, he announced that he was leaving OpenAI. This was quickly followed by a tweet from Ilyaâs #2, Jan Leike who just said âI resignedâ. He then followed up with more details about why on Friday in a series of tweets.
In these tweets, Jan Leike said that âOpenAI is putting âshiny productsâ above safetyâ, which is what was rumoured to be at the centre of Sam Altmanâs removal as CEO back in November.
With the co-leaders of OpenAI, Superalignment team having now left, this leaves some big question marks over the future of OpenAIâs efforts in this space. It was only in July last year that this team was announced and promised 20% of all of OpenAIâs compute. This obviously didnât happen and one of the main reasons cited by Jan Leike for leaving.
Sam Altman acknowledged on Twitter that there is a lot more for OpenAI to do around alignment research and safety, and heâs going to publish a longer post on this in the next couple of days. It will be interesting to see what he says.
GPT-4o
GPT-4o was a big announcement from OpenAI, which I wrote about extensively on Monday. I then followed up with a few more thoughts on Tuesday and since then weâve seen some more impressive features, such as the image above which was shared by Greg Brockman. The image was generated by GPT-4o and it seems there are many more capabilities to be explored with GPT-4o as they are released over the coming months.
As more is shared of GPT-4o and we get access to some of the features announced Iâm more and more impressed by it. Many of these features are enabled by the fact that it is a multimodal model and makes me wonder why OpenAI didnât call this release GPT-4.5 or even GPT-5. I suspect we have some even more amazing things coming our way when GPT-5 is released later this year!
Google I/O 2024
The day after GPT-4o was announced, Google had their I/O 2024 conference and in the 2 hour keynote presentation they made a huge number of AI-related announcements (120+ mentions!). You can find a good summary here, but below are my highlights:
Project Astra - This is arguably the headline product/feature of all the Gemini announcements at Google I/O. Project Astra is a âresearch previewâ but shows an impressive multimodal AI assistant that can watch, understand and remember what it sees through your deviceâs camera. Google are positioning Project Astra as âA universal AI agent that is helpful in everyday lifeâ. The demo videos are very impressive and well worth a watch.
Gemini Live - This is Googleâs answer to GPT-4oâs voice capabilities, but doesnât sound quite as advanced. âIn the coming monthsâ a new mobile experience will be rolling out that allow users to have a more intuitive conversation with Gemini, choose from a variety of natural-sounding voices, speak at their own pace and be able to interrupt Gemini.
Gemini 1.5 Flash - This is a smaller Gemini model thatâs optimised for speed and reduces the modelâs response time. This is the model that will undoubtedly be powering Gemini Live and is currently available for public preview for Google AI Studio.
AI Overviews - Previously known as âSearch Generative Experienceâ, AI Overviews will populate search results pages with summarised answers from the web, similar to other AI search tools from Perplexity and Arc Search. AI Overviews will be rolling out to everyone in the US this week.
Search by video - This is a new feature in Lens that allows you to search by shooting a video, so you can now use images, audio, or video to start a search. If you search with video you still get a typical set of Google search results, but the point is to get there faster and to make it easier to tell Google what youâre looking for.
Gemini on Chrome - Google are adding Gemini Nano, itâs smaller Gemini model, to Chrome on desktop. This will help users generate text for things like social posts and product reviews, similar to how Microsoft added Copilot to Edge last year.
Ask Photos - This is essentially enhanced search in Google Photos, allowing you to search for photos using more detailed, natural language questions, similar to features Apple released in iOS 17. Ask Photos will be available in the âcoming monthsâ.
Google Veo - This is Googleâs answer to OpenAIâs Sora model and theyâve bought Donald Glover on the ride with them! Leo generated high-quality, 1080p resolution videos than can go beyond a minute in a variety of styles. It will be released to select creators over the coming weeks and people can join the waitlist for access.
Gems - These are Googleâs answer to OpenAIâs GPTs. Youâll be able to create customised Gems that each have their own interface and use a specific set of instructions that the user has created.
Many of the AI announcements that Google made at I/O are in response to many of the things that OpenAI have announced this year, or competitors are already offering in the search space. These features are all âcoming soon'â or âresearch previewsâ so it will be some time before people get their hands on all this and can really test them out.
Overall it feels like Google announced a lot of great things at I/O this week, but none of them are really new ideas. The most powerful and progressive AI work that Google are currently doing is in medicine, and it doesnât feel like theyâve quite got into their consumer product groove yet.
DeepMind is experimenting with a nearly indestructible robot hand
This is some really impressive research - the hand is extremely flexible, fast, strong and durable. It can go from fully open to closed in 500 milliseconds, can exert 10 newtons of force and can withstand repeated punishment without sustaining fatal damage. Each finger has 100s of sensors on the fingertips as well as the fingers, which will provide AI models with huge volumes of new data to learn from.
Replicating human hands is incredibly complex - they need to be flexible but strong, be able to be very delicate but also exert strong force, and need to have a wide range of movement as well as be able to perform simple pinching actions. Itâs great to see the progress weâre making in this field.
Behind the scenes of making âDeflatedâ with OpenAIâs Sora
This is just a little follow up to last weekâs newsletter when I shared âDeflatedâ which was the follow up to shy kidsâ short film âair headâ, made with OpenAIâs Sora. Itâs great to see how shy kids are combining outputs from Sora with filmed footage and editing techniques. This gives a great insight into how something like Sora will be used in the real world.
AI Ethics News
AI may cause job losses and rise in carbon emissions, report finds
AI GPU bottleneck has eased, but now power will constrain AI growth warns Zuckerberg
OpenAI thinks it knows what media is "high quality," and that's a problem
Why it is so dangerous for AI to learn how to lie: âIt will deceive us like the richâ
The SF Bay Area Has Become The Undisputed Leader In AI Tech And Funding Dollars
Long Reads
One Useful Thing - What OpenAI did
TED - Fei-Fei Li: With spatial intelligence, AI will understand the real world
Nature - How does ChatGPT âthinkâ? Psychology and neuroscience crack open AI large language models
Quanta Magazine - Game Theory Can Make AI More Correct and Efficient
âThe future is already here, itâs just not evenly distributed.â
William Gibson