A week in Generative AI: Operator, Stargate & DeepSeek
News for the week ending 26th January 2025
Big Week! We had the anticipated launch of Operator from OpenAI, their first AI agent, the announcement of the $500bn Project Stargate on day two of the new presidential term and a hugely significant open source release from the Chinese company DeepSeek whose R1 model rivals OpenAIās o1 at a fraction of the cost.
In Ethics News, President Bidenās AI Act was repealed by the incoming administration and the Pentagon said that AI was āspeeding up its kill chainā which sounds horrendous.
In Long Reads I highly recommend taking a look at Wiredās article on DeepSeek and also John Gruberās article on how dumb Siri currently is.
OpenAI launches Operator, an agent that performs tasks autonomously
As had been rumoured for a while, OpenAI took the wraps off their first foray into AI agents this week by announcing Operator.
Operator is what OpenAI calls a āComputer-Using Agentā (CUA). It combines GPT-4oās vision capabilities with the advanced reasoning capabilities seen in models like o1 and o3 to interact with the graphical user interfaces (GUIs) people see on a screen, just like humans do. You can see a demo of Operator in action here.
Operator is currently a research preview and only available to Pro users ($200 per month) in the US, so I havenāt had a chance to test it myself yet. However, the broad consensus Iāve seen is that this is a much more polished version of Anthropicās Computer Use that they announced back in October.
Conceptually this type of approach is a great idea, and whilst these general use agents can perform a wide variety of tasks (e.g. download files, combine PDFs, process email attachments etc.), they are not yet reliable enough. Based on some of the benchmarks shared, Operator is around 40-60% reliable vs. humans at around 80%. I suspect agents will need to get to 90%+ reliability (i.e. better than human) before people will begin trusting them to perform tasks on their behalf. Having said that, given the progress weāve seen OpenAI make between the o1 and o3 models I suspect we will see 90%+ reliability in the next 12-18 months.
One other thing worth mentioning is that in the EU, as the laws stand right now, AI agents like Operator will not lawfully be able to help consumers with anything relating to their bank accounts. Iāve written previously about how companies like Stripe are building out the technology to allow AI agents to manage payments, which I think will be a big use case for the technology once it becomes reliable enough.
However, Sam Altman helpfully shared this tweet which details a potted history of how Open Banking came about in the EU and some of the sacrifices made along the way. TLDR is that digital assistants (like Operator) will only be allowed to interact with banking systems in the EU via Open Banking APIs (which is not how theyāre currently designed to work).
Project Stargate: a $500bn AI data center company
Rumours about Project Stargate were floating around most of last year and I first wrote about it back in April when it was described as being a $100bn project lead by OpenAI and Microsoft. It seems things have moved on a lot since then with more investors coming on board, meaning that Microsoft will no longer be OpenAIās exclusive cloud provider.
However, the basic premise of the project hasnāt changed - invest a huge amount of capital in building out a huge amount of AI infrastructure (data centres, power generation etc.) so that the US can take a strategic lead in developing artificial general intelligence.
The Trump administration sees this project as a big proof point for how they will bring more investment to the US and create more jobs, which is why such a splashy announcement was made on just day two of their new presidential term.
Thankfully, reports say that Stargate will use solar and batteries for power, which is surprising given the Trump administrationās involvement in the project. Weāll just have to see how the project progresses over the next 4 years and how much of an environmental impact it has.
Cutting-edge Chinese āreasoningā model rivals OpenAI o1āand itās free to download
DeepSeek is an interesting Chinese AI company that has been spun out of hedge-fund and has been building some of the most cutting-edge open source AI capabilities over the last couple of years.
There have been lots of sensationalists news stories this week about the launch of their DeepSeek R1 model, which is open source, matches OpenAIās o1 in capabilities, and was trained at a much lower cost. Thereās a good long read from Wired about them here.
Without veering towards the sensationalist, what DeepSeek have achieved is incredibly impressive and challenges a few assumptions that the predominantly US-based AI industry has made:
Firstly, you donāt need access to cutting-edge AI hardware to build cutting-edge AI models. 18 months ago, the US ordered the immediate halt to the export of the most powerful AI chips to China in an attempt to protect their lead in AI. However, this has just forced Chinese AI companies like DeepSeek to innovate and find ways to get more from older AI hardware.
Secondly, there is āno moatā in AI. Early last year it looked like the Open Source community was about 12-18 months behind the frontier AI companies. With the release of Metaās Llama 3 in the summer, it looked like the Open Source community was about 6 months behind the frontier AI companies. Now, with the release of DeepSeek R1 it looks like the Open Source community is about 3 months behind the frontier AI companies. You can probably see where Iām going with this⦠If Open Source AI models have similar capabilities to the state-of-the-art frontier models then the technology will quickly become commoditised and it could be difficult to recoup the $tr that have been invested into the frontier AI companies.
Lastly, the most capable AI models donāt need to be expensive. With OpenAI launching a $200 Pro subscription tier late last year, it was starting to look like cutting edge AI models would only be available to those that could afford it, which could set a very dangerous precedent. DeepSeekās R1 is 98% cheaper to use than OpenAIās o1, which shows that it is possible to build frontier models that are not only cheaper (and more environmental friendly) to train, but also to use. This is great for everyone.
Inside Anthropic's Race to Build a Smarter Claude and Human-Level AI
Dario Amodei sat down with the Wall Street Journalās Joanna Stern at Davos this week to discuss the $2bn of additional funding theyāre raising to help address the surge in demand theyāve seen for their services over the last year, and specifically the last 3 months.
As a heavy Claude user (my go-to model for most things) this is music to my ears as Iāve been consistently coming up against usage limits recently. It was also great to hear whatās going to coming from Anthropic this year - smarter models, web integration, two-way voice mode, and memory. To be honest, this is everything thatās been on my wish list for a while now so very exciting!
Iām a great admirer of Anthropic and the way they go about business - theyāre thoughtful and deliberate in how they release new capabilities/features and this usually results in much higher quality models that I find much nicer to use.
Perplexity has a busy week
Itās been a busy week for Perplexity who launched an API for AI search, launched an assistant for Android, and made a bid to merge with TikTok in the US (although, who didnāt?!).
Perplexity took on new funding at the end of last year, valuing them at $9bn, and also started taking advertising on their platform in the US. Itās interesting to see where they are headed with these new announcements, especially with the API for AI search which looks to address the hallucination issues common with other large language models.
Perplexityās API, Sonar, allows developers to customise which sources (websites) are used when answering user queries and claims that their Pro tier outperforms leading models from Google, OpenAI, and Anthropic on a benchmark that measures factual correctness in AI chatbot answers, SimpleQA.
Factual correctness and reliability are two sides of the same coin which I think is currently the biggest barrier to frontier AI models having a bigger impact in the real world, so itās great to see more progress being made on this front.
AI Ethics News
AI benchmarking organization criticized for waiting to disclose funding from OpenAI
Paul McCartney calls on UK government to protect artists from AI
Samsungās Galaxy S25 will support Content Credentials to identify AI-generated images
UK to unveil āHumphreyā assistant for civil servants with other AI plans to cut bureaucracy
Long Reads
World Economic Forum at Davos - Debating Technology
Wired - How Chinese AI Startup DeepSeek Made a Model that Rivals OpenAI
Daring Fireball - Siri is super dumb and getting dumber
Denis Hassabis - Path to AGI
āThe future is already here, itās just not evenly distributed.ā
William Gibson
Do you think the US will ban the new Chinese AI?