A week in Generative AI: Gemini, Sora & NVIDIA
News for the week ending 18th February 2024
It seems Google isn’t anywhere near slowing down the Gemini news cycle just yet. Right on the heels of last week’s Gemini news, we now have Gemini 1.5 Pro announced, just a few weeks after Gemini 1.0 Pro was released to the world. Not to be outdone, just a few hours after the Google announcement, OpenAI announced Sora, their first text-to-video model. But it’s way more than just text-to-video, it’s actually starting to model physics and reality as an emergent behaviour. Another busy and fascinating week in GenAI!
Gemini 1.5 Pro
I kind of get why Google has numbered this update 1.5, but it does the incredible work they’ve done a huge disservice. This is no small update and it moves the GenAI game on another level. To summarise the improvements made:
Gemini 1.5 Pro achieves near perfect retrieval of facts and details from 10m tokens (c. 7.5m words) of context. No more need for Retrieval Augmented Generation (RAG) to stop hallucinations and ensure accurate citations in answers.
Gemini 1.5 Pro is better at multimodal tasks (text, vision, audio) and the fact that it was build from the ground up as a multimodal model is really starting to shine through.
Gemini 1.5 Pro is the best model at writing prose, with even state-of-the-art AI content detectors unable to distinguish between its content and human created content.
The speed that Google has shipped Pro 1.5 is incredible, it was probably trained, tested, fine-tuned and released in less than a couple of months as it builds on research that was released just last month, as per this tweet.
If you’re interested in a more in-depth analysis, please check out this great video from AI Explained.
Sora
Up until this week, arguably the best text-to-video model was from Pika and it could create pretty good 4 second video clips that were still very much in the uncanny valley. We now have Sora from OpenAI and it creates videos of up to 1 minute long that so far seem to have crossed the uncanny valley into full photo realism.
This isn’t as simple as just being able to generate a video that’s 15x longer. To make a video 1 minute long, objects in the video need consistency, shadows and reflections need to be realistic, objects need to react to each other and this all needs to stay true to the original prompt. For photo realistic videos, physics needs to be modelled.
What OpenAI have achieved here is astounding. This is the GPT-3 moment for video generation. The closest analogy to this technology that we currently have is a gaming engine like Unreal Engine 5 (UE5). Dr. Jim Fan put this perfectly:
The difference is that UE5 is hand-crafted and precise, but Sora is purely learned through data and "intuitive".
What Sora demonstrates is that by training a generative AI model on enough video clips, the ability to model reality will start to emerge. This is exactly the same as when training a generative AI model on enough human text, knowledge starts to emerge.
You can see some of the best clips from Sora shared by OpenAI here and if you’re interested in a more in-depth analysis, please check out this great video from AI Explained.
NVIDIA Chat With RTX released
In any other week, this would have been huge news, but Chat with RTX now looks rather meek in the context of what Google has achieved with Gemini 1.5 Pro. However, this is an important early example of how generative AI models will run on user devices, become personalised and run really fast. I think we’ll see this side of GenAI develop considerably in the next 12 months and beyond.
OpenAI adds long term memory to ChatGPT
OpenAI has started testing the ability for ChatGPT to remember things that you discuss with it to make future chats more helpful. These memories are all accessible within settings and can be deleted individually or en masse.
This is a great step towards a more personalised experience and will greatly help as ChatGPT morphs into more of a personal assistant in the future.
AI Ethics News
Google pledges 25 million euros to boost AI skills in Europe
OpenAI CEO warns that ‘societal misalignments’ could make artificial intelligence dangerous
Protesters gather outside OpenAI office, opposing military AI and AGI
Open AI is disrupting malicious uses of AI by state-affiliated threat actors
“The future is already here, it’s just not evenly distributed.“
William Gibson