The big news this week includes a new ‘pocketable’ model from Google, Gemma 3, the launch of Agent Tools for developers from OpenAI and a lot of hype that’s built around a new AI Agent called Manus.
The Gemma 3 launch from Google is very impressive, essentially a small, open model that’s on a par with the frontier models from 6 months ago in terms of capabilities and features. The new Agent Tools from OpenAI are technical, but point towards where AI agents are going and I suspect it won’t be long before we see some incredibly capable AI systems that can perform meaningful tasks in the real (digital) world. Manus is an early proof-of-concept for that - it’s very early but already very capable and getting a lot of hype (both good and bad).
This week there lots of ethics news too, from OpenAI and Google asking to be allowed to train on data they don’t own, to hallucinations from AI Search Engines, through to Sam Altman sharing a short story written by a new creative writing model.
Some good long reads from John Gruber and Ethan Mollick too.
Oh, and Terminators building Terminators…. 🤖🛠️🤖… you’ll see what I mean…
Enjoy!
Google calls Gemma 3 the most powerful AI model you can run on one GPU
Google have done a great job with Gemma 3 and it’s now probably the most capable model that you could run locally on your computer. As previously, its a family of models coming in a range of sizes - 1B, 4B, 12B, and 27B parameters.
Gemma 3 is built using the same architecture as Gemini 2.0 and has lots of great capabilities such as being multilingual (140+ languages supported), multimodal ( text, images and videos), it has a long context window (128,000 tokens) and can also perform function calling & respond with structured outputs, making it great for building more agent-like applications.
When looking at the Chatbot Arena leaderboard, Gemma 3 27B gets an ELO score of 1338 putting it on a par with DeepSeek’s chat model. Gemma 3 12B isn’t ranked yet, but it will be interesting to see how well it does. I expect it to come in around an ELO score of 1300 which mean we now have a ‘pocketable’ open model that has similar capabilities to the frontier models we had just 6 months ago.
It’s not just progress at the frontier that’s moving fast, but progress with smaller, cheaper, more environmentally friendly models as well.
OpenAI launches new tools to help businesses build AI agents
In some ways this is a very technical release from OpenAI, squarely aimed at developers, but on the other hand this is a very good indicator of where things are going and what generative AI models will be able to do for people in the not too distant future.
OpenAI announced three new tools for developer who want to build ‘agents’ which they defined as “a system that can act independently to do tasks on your behalf.” The tools they announced are:
Web Search - allows a generative AI model to access up-do-date information from the web, and is the same tool that powers ChatGPT Search. This makes it useful for finding information online and can help address hallucinations.
File Search - allows a generative AI model to search through files. This makes it useful for accessing specific knowledge that’s private and saved locally.
Computer Use - allows a generative AI model to control a computer. This makes it useful for automating tasks in applications that don’t have accessible APIs.
OpenAI also announced an Agents SDK which allows multiple generative AI agents to work together where agents that are specialised at individual tasks can work with other agents to complete a larger workflow.
So we now have the ability to get generative AI agents to work together, they can operate computers, access the internet and read local files. Putting all of those together covers off quite a lot of tasks that fall under the general umbrella of ‘knowledge work’!
Now obviously this isn’t all currently working together in one system and it hasn’t been proven how useful this all is in the real (digital) world. But the tools now exist, a huge number of developers will now be working with them, and its only a matter of time before we see genuinely useful AI agents that put all of these tools to use.
Speaking of….see below!
Manus Hype
A new agentic AI platform, called Manus has been getting a lot of hype over the last week or two. It’s interesting because it’s been getting both a lot of praise and a lot of criticism. The Head of Product at Hugging Face called it the most impressive AI tool he’s ever tried, whilst an MIT Technology review put it to the test and found that it can suffer from frequent crashes, system instability, and struggles with large chunk of text. However, the platform is in early access so these sorts of teething issues are to be expected, especially as it has got a lot of attention, and a lot of users in a very short period of time.
The other interesting thing is that Manus doesn’t really use any specialised agentic models to power its capabilities. It uses models such as Anthropic’s Claude and Alibaba’s Qwen. So this shows what’s possible with the current generation of models without much finetuning or specialisation.
I haven’t been able to get access and try the platform myself yet, but one thing I really like is the Use Case Gallery on their website. This lists lots of different use cases that you can see replayed, which essentially shows you a screen recording of the platform in action, so you can get a good idea of how it works and what its capable of.
I’m sure this won’t be the last new agentic AI platform to be launched by a start-up this year, but as the large frontier models become more agent-like, I’m not sure how valuable these other platforms will be.
Google show off Gemini Robotics
This is an impressive demo video of both Google’s Gemini Robotics model, based on Gemini 2.0, and also the dexterity of the arms the model is controlling.
Google describe the model as a vision-language-action model - it can see, understand natural language instructions, and take action in the real world. Google says that AI models for robotics need to be general (i.e. they’re able to adapt to different situations), interactive, and dextrous and their demo video certainly shows all three of these. Great work!
Figure introduces BotQ
So, this is what the Terminator looks like with a high-volume production line run by Skynet?!
Figure just introduced BotQ - it’s high-volume manufacturing facility for humanoid robots that will be capable of manufacturing 12,000 humanoids per year, scaling to 100,000 per year in four years time According to the company, Figure’s humanoid robots will be used in the manufacturing process to build other humanoid robots this year. The facility isn’t fully automated yet, but Figure anticipates using more humanoid robots over time to increase automation.
They also announced that they have completed the design of their next-generation robot, Figure 03, which will be their production robot that is built for affordability and high-volume manufacturing. I’m looking forward to seeing it when it’s revealed!
AI Ethics News
OpenAI and Google ask the government to let them train AI on content they don’t own
AI Search Engines Invent Sources for ~60% of Queries, Study Finds
OpenAI’s metafictional short story about grief is beautiful and moving
Moonvalley introduces a ‘clean’ generative video AI model for cinema and advertising
Anthropic CEO says spies are after $100M AI secrets in a ‘few lines of code’
Sesame, the startup behind the viral virtual assistant Maya, releases its base AI model
Long Reads
Daring Fireball - Something is Rotten in the State of Cupertino
One Useful Thing - Speaking things into existence
“The future is already here, it’s just not evenly distributed.“
William Gibson