A week in Generative AI: Gemini 2.0, Deep Research & Robots
News for the week ending 9th February 2025
This week weāve seen a huge amount of news, most of it driven by the continuing fallout of the released of DeepSeekās models a couple of weeks ago. There was the release of the whole family of Google DeepMindās Gemini 2.0 models, as well as the release of OpenAIās Deep Research model. Both will have big implications going forwards.
OpenAI also announced that they were making o3-miniās though process more transparent, with a fantastic example of this thinking shared by Sam Altman. There was also some excellent analysis by latent.space on the cost-intelligence frontier of generative AI models. Weāre seeing the cost of intelligence trending to zero and falling rapidly. Thereās also some fun robot videos from NVIDIA and Apple, because why not?!
In Ethics News there is a report on why employees smuggle AI into work, some examples from TikTok of how Deepfakes are getting shockingly good and an update from Google DeepMind to their frontier safety framework.
In Long Reads thereās a great article from Ethan Mollick on The End of Search, The Beginning of Research and if you have a spare 3 hours, a fantastic video tutorial from Andrej Karpathy on how large language models are built and work.
I also mention Ethan Mollick 5 times in this weekās newsletter and link to a lot of the commentary and content heās been producing this week. If youāre not subscribed to his One Useful Thing blog and following him on his various socials, youāre really missing out. This week it feels like Iām getting all my ideas from him!
Gemini 2.0 is now available to everyone
The big news of the week was the release of Google DeepMindās Gemini 2.0 to everyone. This follows an experimental version of Gemini 2.0 Flash being released in December, but we now have the full family of Gemini 2.0 models. The table above is a useful reference to see the difference between the models. There are lots of great features and quality of life improvements across the whole family.
Gemini 2.0 Flash is available to all users but 2.0 Pro Experimental is only available to Advanced subscribers. Thereās also a Flash-Lite and a Flash Thinking Experimental model. Donāt worry, itās not you - I find all these model naming conventions incredibly confusing and hard to keep track of too!
2.0 Pro Experimental is a very impressive model. Ethan Mollick claims itās the first GPT-5 class model with wide public release. He shared a map of western Europe with all castles marked, which the model correctly identified. 2.0 Pro Experimental is the first model to get this right - I tested with GPT-4o, o1 (o3-mini canāt analyse images), and Claude 3.5 Sonnet and none of them got it right, with them all thinking it was a map of population or infrastructure.
One test alone doesnāt show that 2.0 Pro Experimental is a next generation GPT-5 class model and only time will tell if this is really a generational jump in capability.
Potentially the bigger news is 2.0 Flash-Lite which is Google DeepMindās most cost-efficient model yet, no doubt driven by the cost efficiencies seen with DeepSeek. Itās currently the leader in price-performance across all generative AI models available, something I get into more below.
OpenAI unveils a new ChatGPT agent for ādeep researchā
Itās rare for OpenAI to release something on a Sunday night, but thatās exactly what they did, announcing Deep Research just a few hours after last weekās newsletter went out! I suspect that this was one of the releases that Sam Altman said they āmove upā in response to the waves DeepSeek was making a couple of weeks ago.
OpenAIās Deep Research is their version of Google DeepMindāsā¦. Deep Research that they released back in December. Itās a specialised digital AI agent that can work for you independently to research a specific topic. It will find and analyse hundreds of online sources related to the topic you want it to research and write a comprehensive report, which OpenAI claims is at the level of a research analyst (which is confirmed by people whoāve been able to test it - see below).
Deep Research is powered by OpenAIās o3 model (not o3-mini, and which hasnāt been released yet) and uses the modelās reasoning capabilities to plan, search, interpret, and analyse text, PDFs and images it finds on the internet. Itās only available to Pro users currently (the $200 per month tier) and isnāt available to users in the UK, Switzerland or the European Economic Area. It will be coming to Plus and Team users soon, with Enterprise users to follow afterwards. Because of this, I havenāt been able to test it myself yet, but the feedback from those that have has been incredibly impressive. Below are a few interesting things that have surfaced this week amongst those that have been able to test Deep Research:
Ethan Mollick shared a post showing how Deep Research basically went through the same learning journey as he did in the first year of studying for his PhD, although did it in just a few seconds.
Tyler Cowen, an Economics Professor, thinks that Deep Research is like having a good PhD-level research assistant but it can compress what they do from 1-2 weeks into minutes.
AI Explained found that Deep Research performed much better than Googleās version and DeepSeek R1, although tended to ask lots of clarifying questions.
Ethan Mollick again says Deep Research is good at doing research that gives you a new perspective on a topic and can help inform your own thoughts.
In summary, OpenAIās Deep Research is very capable and impressive. Itās what I would classify as a āLevel 3ā semi-autonomous specialised digital AI agent (more on this in my opinion piece on Web 4.0 and the rise of the Agentic Web, coming later this week). Deep Research might be a specialised, narrow AI agent but itās already creating economic value by significantly reducing research time across a broad range of topics. Iād definitely say this is an important step towards Artificial General Intelligence that would have been hard to imagine 2-3 years ago.
OpenAI now reveals more of its o3-mini modelās thought process
In and of itself, OpenAI updating how they display o3-miniās thought process isnāt much to write home about. However, the example that Sam Altman used in his post about this update was too good not to share and comment on.
In the above example, o3-mini is just given the following prompt āš¦øš¦š šāā and the image above shows its thought process. I think this is some incredibly impressive reasoning! I wouldnāt have been able to work out this riddle without a lot of thinking time, and even then Iām not sure I could have got it.
I was concerned that o3-mini solving this emoji-riddle might be too good to be true, so I decided to test it out myself. It took 4 attempts in 4 fresh chats for it to figure it out, but when it did it gave a great answer. It also took 31 seconds to reason - far quicker than I would have been able to solve it. You can see my results here.
Itās taking a bit of time for people to get their heads around these new reasoning models, work out what theyāre for, and find good ways to use them. I hope this example helps the penny drop for many of you - theyāre incredibly intelligent models that most people just havenāt found the killer use case for yet.
Letās see if you can solve this one - š»ššŗš¶. o3-mini got it first go with 12 seconds of reasoningā¦.
The costs of intelligence are dropping quickly
The chart above is very busy, but summarises some fantastic work by the team at latent.space. It attempts to answer a question Iāve long wanted answered which is āWhat is the trade off between cost and intelligence?ā Just focus on the coloured lines, which represent the cost-intelligence frontier for three AI companies - thereās one for OpenAI, one for DeepSeek and one for Google DeepMind.
There are a few important things to take away from this analysis:
Gemini 2.0 Flash Thinking is currently the leader in cost-intelligence - it delivers the lowest cost to intelligence ratio, even cheaper than DeepSeek R1
Weāre (arguably) still in the GPT-4 era of models and the analysis shows that the cost of that level of intelligence has dropped 1,000x in the last 18 months.
For reference Mooreās Law predicted a 2x increase in transistor density every 2 yearsā¦. the pace and progress weāre seeing on intelligence right now is astounding.
The intelligence market has obviously adjusted very quickly post the reveal of DeepSeek. In the last couple of weeks weāve seen significant price drops from both OpenAI and Google to remain competitive.
To put this in context, Ethan Mollick did some analysis on the AI ARC-AGI challenge that o3 made huge waves about when it scored 88% late last year. In the last 3 months weāve see OpenAIās o1 get 32% on the benchmark at a cost of $3.8 per task, DeepSeekās R1 get 15.8% at a cost of $0.06 per task and most recently weāve seen OpenAIās o3-mini get 35% at a cost of $0.04 per task. We donāt know what the cost of o3 will be yet, so it will be interesting to see how it compares to these costs when itās released.
For me, there are two really exciting things happening in the AI industry right now. The first is that weāre really pushing the frontier of the I in AI - weāre building models that are getting much more intelligent and thatās happening really quickly.
The second, and probably more importantly, is that the cost of this intelligence is dropping rapidly and trending to zero. Today you can generate a relevant one-line caption for around 40,000 unique photos with Gemini 2.0 Flash-Lite for less than a dollar. With costs dropping at a rate of 1,000x in 18 months, imagine what weāll be able to do for a dollar in a years time. Imagine what weāll be able to do in 2 years time, or even 5 years time.
Weāre rapidly approaching the cost of human-level intelligence essentially being zero. This will have profound implications for how we view intelligence, how we use intelligence, and what that will mean for the society. I donāt have any insights or answers to share with you, but it is going to be a fun ride over the next few years!
Robots mimicking real world athletes
Weāve been used to seeing real world robotics platforms (not the movie versions!) moving in unnatural ways or moving at a fraction of human speeds, so its great to see NVIDIA make progress on robotics that can move at human speeds and in human-like ways.
The video above shows a robot shooting a hoop like Kobe and celebrating like Cristiano. The robots are first trained in simulation and then the controlling AI model is āpatched upā in the real world to make it more reliable.
Apple built Pixarās robot lamp
This research project was obviously inspired by Pixarās famous lamp and explores how robots can be expressive which in turn makes it more engaging to humans than a standard robot.
Rumours are that weāll first see this āroboticā tech see up in a HomePad product for the home, which is essentially an iPad on a robotic stand. Itās great to see Apple working on robotics as I think their design heritage can bring a lot to how robots fit into the home and everyday life, exactly like this research theyāre doing!
AI Ethics News
Google drops pledge not to use AI for weapons or surveillance
Keir Starmer unveils plan for large nuclear expansion across England and Wales
Long Reads
One Useful Thing - The End of Search, The Beginning of Research
Andreesen Horowitz - Setting the Agenda for Global AI Leadership: Assessing the Roles of Congress and the States
Andrew Karpathy - Deep Dive into LLMs like ChatGPT
Techcrunch - Teslaās Dojo, a timeline
āThe future is already here, itās just not evenly distributed.ā
William Gibson