A week in Generative AI: Superalignment, GPT-4o & Google I/O
News for the week ending 19th May 2024
Wow, what a week in AI!
On Monday we had the announcement of GPT-4o, On Tuesday it was Google I/O and Ilya Sutskever resigned from OpenAI, on Wednesday Anthropic hired the co-founder of Instagram as their new Chief Product Officer, on Thursday OpenAI announced improvements to data analysis in ChatGPT and a partnership with Reddit and on Friday OpenAI reportedly dissolved Its Existential AI Risk Team.
Thereās far too much news for me to be able to comment on everything this week, without making the newsletter too long, so Iāve focused on the news that I think is most significant.
Oh, and robot hands. I had to include robot hands š«²š¤š«±.
There was also a lot of interesting AI ethics news, so make sure to check out that section too. We have the AI summit in Seoul next week, which is the follow up to the UK summit last year. Iāll have more news and coverage from that in my next newsletter.
OpenAI loses the co-leads of its superalignment team as Ilya Sutskever and Jan Leike walk away
Almost 6 months to the day from when Ilya Sutskever was at the centre of Sam Altmanās removal and re-instatement as OpenAIās CEO, he announced that he was leaving OpenAI. This was quickly followed by a tweet from Ilyaās #2, Jan Leike who just said āI resignedā. He then followed up with more details about why on Friday in a series of tweets.
In these tweets, Jan Leike said that āOpenAI is putting āshiny productsā above safetyā, which is what was rumoured to be at the centre of Sam Altmanās removal as CEO back in November.
With the co-leaders of OpenAI, Superalignment team having now left, this leaves some big question marks over the future of OpenAIās efforts in this space. It was only in July last year that this team was announced and promised 20% of all of OpenAIās compute. This obviously didnāt happen and one of the main reasons cited by Jan Leike for leaving.
Sam Altman acknowledged on Twitter that there is a lot more for OpenAI to do around alignment research and safety, and heās going to publish a longer post on this in the next couple of days. It will be interesting to see what he says.
GPT-4o
GPT-4o was a big announcement from OpenAI, which I wrote about extensively on Monday. I then followed up with a few more thoughts on Tuesday and since then weāve seen some more impressive features, such as the image above which was shared by Greg Brockman. The image was generated by GPT-4o and it seems there are many more capabilities to be explored with GPT-4o as they are released over the coming months.
As more is shared of GPT-4o and we get access to some of the features announced Iām more and more impressed by it. Many of these features are enabled by the fact that it is a multimodal model and makes me wonder why OpenAI didnāt call this release GPT-4.5 or even GPT-5. I suspect we have some even more amazing things coming our way when GPT-5 is released later this year!
Google I/O 2024
The day after GPT-4o was announced, Google had their I/O 2024 conference and in the 2 hour keynote presentation they made a huge number of AI-related announcements (120+ mentions!). You can find a good summary here, but below are my highlights:
Project Astra - This is arguably the headline product/feature of all the Gemini announcements at Google I/O. Project Astra is a āresearch previewā but shows an impressive multimodal AI assistant that can watch, understand and remember what it sees through your deviceās camera. Google are positioning Project Astra as āA universal AI agent that is helpful in everyday lifeā. The demo videos are very impressive and well worth a watch.
Gemini Live - This is Googleās answer to GPT-4oās voice capabilities, but doesnāt sound quite as advanced. āIn the coming monthsā a new mobile experience will be rolling out that allow users to have a more intuitive conversation with Gemini, choose from a variety of natural-sounding voices, speak at their own pace and be able to interrupt Gemini.
Gemini 1.5 Flash - This is a smaller Gemini model thatās optimised for speed and reduces the modelās response time. This is the model that will undoubtedly be powering Gemini Live and is currently available for public preview for Google AI Studio.
AI Overviews - Previously known as āSearch Generative Experienceā, AI Overviews will populate search results pages with summarised answers from the web, similar to other AI search tools from Perplexity and Arc Search. AI Overviews will be rolling out to everyone in the US this week.
Search by video - This is a new feature in Lens that allows you to search by shooting a video, so you can now use images, audio, or video to start a search. If you search with video you still get a typical set of Google search results, but the point is to get there faster and to make it easier to tell Google what youāre looking for.
Gemini on Chrome - Google are adding Gemini Nano, itās smaller Gemini model, to Chrome on desktop. This will help users generate text for things like social posts and product reviews, similar to how Microsoft added Copilot to Edge last year.
Ask Photos - This is essentially enhanced search in Google Photos, allowing you to search for photos using more detailed, natural language questions, similar to features Apple released in iOS 17. Ask Photos will be available in the ācoming monthsā.
Google Veo - This is Googleās answer to OpenAIās Sora model and theyāve bought Donald Glover on the ride with them! Leo generated high-quality, 1080p resolution videos than can go beyond a minute in a variety of styles. It will be released to select creators over the coming weeks and people can join the waitlist for access.
Gems - These are Googleās answer to OpenAIās GPTs. Youāll be able to create customised Gems that each have their own interface and use a specific set of instructions that the user has created.
Many of the AI announcements that Google made at I/O are in response to many of the things that OpenAI have announced this year, or competitors are already offering in the search space. These features are all ācoming soon'ā or āresearch previewsā so it will be some time before people get their hands on all this and can really test them out.
Overall it feels like Google announced a lot of great things at I/O this week, but none of them are really new ideas. The most powerful and progressive AI work that Google are currently doing is in medicine, and it doesnāt feel like theyāve quite got into their consumer product groove yet.
DeepMind is experimenting with a nearly indestructible robot hand
This is some really impressive research - the hand is extremely flexible, fast, strong and durable. It can go from fully open to closed in 500 milliseconds, can exert 10 newtons of force and can withstand repeated punishment without sustaining fatal damage. Each finger has 100s of sensors on the fingertips as well as the fingers, which will provide AI models with huge volumes of new data to learn from.
Replicating human hands is incredibly complex - they need to be flexible but strong, be able to be very delicate but also exert strong force, and need to have a wide range of movement as well as be able to perform simple pinching actions. Itās great to see the progress weāre making in this field.
Behind the scenes of making āDeflatedā with OpenAIās Sora
This is just a little follow up to last weekās newsletter when I shared āDeflatedā which was the follow up to shy kidsā short film āair headā, made with OpenAIās Sora. Itās great to see how shy kids are combining outputs from Sora with filmed footage and editing techniques. This gives a great insight into how something like Sora will be used in the real world.
AI Ethics News
AI may cause job losses and rise in carbon emissions, report finds
AI GPU bottleneck has eased, but now power will constrain AI growth warns Zuckerberg
OpenAI thinks it knows what media is "high quality," and that's a problem
Why it is so dangerous for AI to learn how to lie: āIt will deceive us like the richā
The SF Bay Area Has Become The Undisputed Leader In AI Tech And Funding Dollars
Long Reads
One Useful Thing - What OpenAI did
TED - Fei-Fei Li: With spatial intelligence, AI will understand the real world
Nature - How does ChatGPT āthinkā? Psychology and neuroscience crack open AI large language models
Quanta Magazine - Game Theory Can Make AI More Correct and Efficient
āThe future is already here, itās just not evenly distributed.ā
William Gibson





