A week in Generative AI: Gemini, 12 Days & B-ball
News for the week ending 15th December 2024
If you’d asked me last week what I thought the headline of this week’s newsletter would be, I would have said the continuation of OpenAI’s ‘Ship-mas’. Despite a huge number of impressive new announcements and releases from OpenAI, they have been pipped by the announcements from Google this week. There are so many great announcements this week that’s it’s going to take the industry a while to get its head around them all. Beyond Google and OpenAI, Anthropic announced the released of Claude 3.5 Haiku and Microsoft previewed Phi-4.
There was also a good volume of Ethics news covering a BBC complaint, Harvard making 1m books available for training and a good article on how Japanese contributions to AI are being forgotten. In Long Reads it’s worth checking out the latest entry from Ethan Mollick and Benedict Evans’ presentation at Slush Festival.
Google drops Gemini 2.0 and a host of other capabilities
So, big week for Google with lots of great releases, previews and updates. I’ve tried to summarise all of them below for you:
Gemini 2.0 Flash - the first model being released in the Gemini 2.0 family that outperforms 1.5 Pro at 2x the speed
Gemini Deep Research - a personal AI research assistant that can explore complex topics on your behalf
Jules - an AI code agent that can assist developers
Gemini 2.0 for games - agents that can help gamers by offering up suggestions of what to do next
NotebookLM improvements - a subscription model and a new feature that allows you to interact with the hosts of a generated podcast
Project Astra - a research prototype of a universal AI assistant that can see and understand the world
Project Mariner - a research prototype of agents that can help people accomplish complex tasks via a web browser
Most impressive of all of these announcements has to be Gemini 2.0 Flash and Gemini Deep Research. There are some other nice previews and updates, some of which give some very impressive demos like Projects Astra and Mariner, but in terms of real world impact right now, Flash and Deep Research are where its at.
When Google first announced Gemini 1.0 just over a year ago, they announced three model sizes - Nano, Pro, and Ultra. They then introduced a Flash variant back in May, which is a smaller version of the Pro model, but not small enough to run on a mobile device like the Nano models. Flash models are therefore the smallest, cheapest Gemini models available that run in the cloud and therefore are capable of still delivering high performance. Very much like OpenAI’s mini models and Anthropic’s Haiku models.
So, Gemini 2.0 Flash is the first model released in the Gemini 2.0 family, with no news yet on other variants. However, despite being a ‘Flash-sized’ model it is more capable than Gemini 1.5 Pro across a whole range of benchmarks and current sits at #4 on the Chatbot Arena leaderboard. Interestingly there’s an experimental version of Gemini called Gemini-Exp-1206 that is currently at the top of the leaderboard, which is very likely to be Gemini 2.0 Pro which will probably release next month.
Gemini Deep Research is kind of a take on OpenAI’s o1’s approach to reasoning with the power of Google’ search capabilities and seems like a response to both o1 and ChatGPT search.
It’s not a separate model (it uses Gemini 1.5 Pro) but is optimised to help users who are doing online research. It uses Google search multiple times to find the best online sources for the research you’re trying to do and then generates a comprehensive report of the key findings that’s well structured, has links to original sources, and can be exported into a Google Doc.
Deep Research is available to all Gemini Advanced subscribers through a web browser now, and will be available in the Gemini mobile app early next year.
12 Days of OpenAI continued…
We’ve seen the continuation of OpenAI’s ‘Ship-mas’ this week and still have 5 more days to go! Below is a summary of what’s been announced/released so far:
Day One: o1 and ChatGPT Pro
Day Two: Reinforcement Fine-Tuning
Day Three: Sora
Day Four: Canvas for all
Day Five: ChatGPT in Apple Intelligence
Day Six: Advanced Voice with Video
Day Seven: Projects in ChatGPT
Sora was obviously the big announcement of the week, and it’s great to see the model now publicly released. It’s not available in the EU/UK yet and can be accessed at sora.com. The quality of videos is incredibly impressive, the user interface for generating videos really moves the game on, and it’s well worth checking out the website just to see all the amazing examples on show!
Of all the other announcements this week, it’s great to see Canvas now available for free users and adding the ability to collaborate on code, and not just text. Advanced Voice with Video has also been eagerly anticipated since it was previewed back in May and brings some exciting new capabilities to ChatGPT such as being able to detect the emotions of users. ChatGPT was added to Apple’s iOS 18.2 release on Wednesday, and I’m excited to see Projects now added to ChatGPT which brings it on a par with how Anthropic’s Claude allows you to organise your chats and interactions.
I suspect there is at least another 1 or 2 big announcements from OpenAI next week. One will most likely be ChatGPT-4.5 and I would also put money on them also announcing something around agents. We’ll have to wait and see!
Anthropic’s 3.5 Haiku model comes to Claude users
Claude 3.5 Haiku was originally announced last month, but is now available to users. Haiku is the smaller, cheapest Claude model and 3.5 Haiku outperforms Claude 3 Opus, the previous flagship model before Claude 3.5 Sonnet was released.
3.5 Haiku has been specifically fine-tuned for coding, data extraction and labelling, and content moderation. It’s text-only to start with, but will be multimodal at some point in the near future.
Microsoft launches Phi-4, a new generative AI model, in research preview
Phi is Microsoft’s family of small language models and aims to compete with the small models from OpenAI (Mini), Google (Flash) and Anthropic (Haiku). Phi-3 was released back in April, and Phi-4 has improved on the previous model by training on ‘high-quality synthetic datasets‘ as well as high-quality human-generated content.
Phi-4 is only available in research preview at the moment, but it will be interesting to see how it stacks up against all the other small models and if the focus on high-quality data gives it a significant bump in capabilities.
Watch Toyota’s robot set a new world record sinking an 80-foot basketball shot like it’s nothing
This is undeniably an amazing record to break and an impressive feat for a humanoid robot to shoot a hoop at 80-foot. I’m even more impressed by the robot’s basketball playing and ability to dribble though! 🏀🗑️
AI Ethics News
BBC says it has complained to Apple over AI-generated fake news attributed to broadcaster
The Guardian view on AI’s power, limits, and risks: it may require rethinking the technology
Chips linked with light could train AI faster while using less energy
Japanese scientists were pioneers of AI, yet they’re being written out of its history
ElevenLabs’ AI voice generation ‘very likely’ used in a Russian influence operation
AI Firm’s ‘Stop Hiring Humans’ Billboard Campaign Sparks Outrage
A test for AGI is closer to being solved — but it may be flawed
Long Reads
One Useful Thing - 15 Times to use AI, and 5 Not to
Benedict Evans - AI Eats The World
Anthropic - What do people use AI models for?
The Atlantic - The GPT era is already ending
“The future is already here, it’s just not evenly distributed.“
William Gibson
Bit of a generic question but do you think that the advancement of ai is a bad thing for humans because I think that people seem to be becoming more lazy and more impatient than ever for example people getting ai to do basic tasks such as researching for school projects?