A week in Generative AI: Chatbot Arena, Veo & Robotics
News for the week ending 22nd September 2024
Post last week’s reveal of OpenAI’s o1, it’s been a quieter news week in generative AI this week. However, we have seen lots more evaluations on how capable o1 is, including now topping the Chatbot Arena. YouTube have also announced that their Dream Scene features will now be powered by a new video model, enabling creators to generate six second clips to post on YouTube Shorts. There’s also a great prediction from Nvidia’s Jim Fan that we’ll see a big breakthrough in robotics in the next three years that will lead to “as many intelligent robots as iPhones“
On the ethics front we have an open letter to the EU signed by some big names in tech and AI asking for more regulatory certainty, Google have said that the UK risks being left behind in the AI race without more data centres, and OpenAI updated their safety & security practices which included Sam Altman stepping down from their Safety Committee.
There are also some good long reads on the scaling of Large Language Models from Ethan Mollick and EpochAI.
Enjoy!
Chatbot Arena: OpenAI o1-preview and o1-mini beat the competition
I’ve talked about Chatbot Arena before, it’s probably the best way we have of evaluating generative AI models in the real world as it pits models against each other and users choose which responses they prefer, leading to a chess-like ELO score.
Since o1 previewed last week, it has topped the table, alongside 01-mini which sits in third place, sandwiching ChatGPT-4o. On one hand, I’m not surprised by this as o1 is certainly the most capable model we’ve seen for solving more complex problems and tasks. But as I wrote about last week, o1 is quite raw and brittle so I’m surprised that users would choose o1 answers over some other models that are more polished.
I’m also surprised that if it is topping the table, that it isn’t scoring significantly higher than ChatGPT-4o. I think it’s probably the nature of they types of questions that users ask in the Chatbot Arena and that the nature of the two models is very different.
YouTube Shorts to integrate Veo, Google’s AI video model
On Wednesday, YouTube announced that they will be integrating Google DeepMind’s AI video generation model, Veo, into YouTube Shorts. This will let creators generate high-quality backgrounds as well as six-second video clips to post on Shorts.
Last year, YouTube launched Dream Scene to allow creates to generate backgrounds for their videos, but this will be the first time that creators can use a tool on YouTube to generate whole, standalone, clips. Veo, which will power the new feature in Dream Scene, generates four images from a text prompt which users can then choose from to generate a video clip.
This new feature is designed to help creators add filler scenes to their videos, but I wonder how long it will be until we see a flood of AI generated content on YouTube Shorts?
Anthropic introduces Contextual Retrieval
This is a bit ‘research-y’ and technical, but I wanted to include it in this week’s newsletter as I think it’s a big deal. Whenever you upload a document to a chat model, or you’re using a chat model that has access to a larger knowledge base, it will be using a technique called RAG (retrieval augmented generation). It’s probably the most widely used method for giving models proprietary knowledge that it can use. There are thousands of use cases from giving a model knowledge about a specific business so that it can help with customer support all the way through to supporting legal practices but giving a model access to data on previous legal cases.
RAG has always been an imperfect approach to giving a model access to proprietary knowledge as you loose a lot of context. RAG effectively allows a model to ‘search’ a knowledge base but it’s tricky to ensure it always retrieves the most important information.
Anthropic have tried to address this with Contextual Retrieval, which they claim improves retrievals by 49%, which can be further boosted to 67% improvements by pairing it with other techniques. This will be a big deal for how useful large language models will be in the enterprise and for a whole variety of other use cases. It should also help models tackle the hallucination problem when they effectively make up an answer in the absence of knowledge on a topic.
Nvidia researcher Jim Fan expects "GPT-3 moment" for robotics in the next few years
Jim Fan, who leads embodied research at Nvidia, has been at the forefront of robotics development and the use of large language models to power them for a few years now. He’s been a big proponent of using simulations to train robotic systems and thinks that we’ll see a breakthrough in robotics akin to the emergence of ChatGPT in the next three years.
That’s not to say that we’re going start seeing humanoid robots in our homes anytime soon, more that there will be a fundamental research breakthrough that will then lead to robots being much more ubiquitous.
AI Ethics News
OpenAI classifies o1 AI models as "medium risk" for persuasion and bioweapons
Google says UK risks being ‘left behind’ in AI race without more data centres
OpenAI's huge valuation hinges on upending corporate structure
Google will begin flagging AI-generated images in Search later this year
Lionsgate partners with AI firm to train generative model on film and TV library
Long Reads
One Useful Thing - Scaling: The State of Play in AI
Epoch AI: Can AI Scaling Continue Through 2030?
Wired: Inside Google’s 7-Year Mission to Give AI a Robot Body
Welch Labs: AI can’t cross this line and we' don’t know why
“The future is already here, it’s just not evenly distributed.“
William Gibson