A week in Generative AI: Llama, Mistral & SearchGPT
News for the week ending 28th July 2024
After a quiet couple of weeks, we’ve had a really big week for the GenAI Timeline!
Meta launched their Llama 3.1 family of models, including the new largest 405B model which is the first time we’ve had a ‘frontier-level’ model available for anyone to download. Hot on their heels, Mistral launched their Large 2 model which is the first ‘GPT-4’ class model trained in Europe. The other big news of the week was OpenAI announcing that they are starting to test SearchGPT, which is a prototype of new search features that they will be integrating into ChatGPT. SearchGPT will have a big impact on how publishers, and brands, can control how their content appears in ChatGPT and will have big implications for marketing.
There’s lots of updates on the ethics front as well with articles on energy usage, strikes in the gaming industry, deepfake detection, and AI agents.
There were also some very interesting articles written by Yoshua Bengio, Mark Zuckerberg and Sam Altman this week which are in the Long Reads section and well worth reading.
Meta Launches Llama 3.1 models
Following last week’s launch of GPT-4o Mini from OpenAI, Meta this week launched their Llama-3.1 family of models. The big headline was the debut of their largest model, Llama-3.1 405B, which is an incredibly impressive achievement by the team at Meta AI. It’s the first time we’ve had a frontier model (one that is at the cutting edge of capabilities) that is open and available for anyone to download, use, and develop with.
Llama-3.1 405B is a ‘GPT-4 class’ model and until we see the release of the next generation of models (probably later this year/early next) it’s as good as the best of the rest.
I ran both all three of the Llama-3.1 models through my marketing benchmarks (2,800+ multiple-choice marketing questions) and the results were surprising. The 405B model comfortably beats GPT-4o (79.8% vs. 78.1%), but so does the medium-sized 70B model (79.3% vs. 78.1%). Ironically Llama-3.1 405B doesn't have the best knowledge of Social Media, it comes in at 75.7% behind the leading model, Claude-3.5 Sonnet on 79.6%!
To give you a flavour of what Llama-3.1 405B is capable of, there’s an incredibly impressive demo video (below) of the model paired with Groq’s fast inference engine, giving near instantaneous responses. It’s really something to behold and shows the power and speed of generative AI models that are likely to be common place in the near future:
This release from Meta is a big deal, not just because the performance is so impressive, but because the Llama family of models are open. For the first time we have an open model that rivals the performance of the top, frontier closed models, which means it is available to everyone to work with and build on.
There’s also some good commentary on the release from The Guardian here.
Mistral Launches Large 2 model
The day after Meta’s Llama-3.1 launch we had the launch of Mistral Large 2, the French startups’ largest, most capable open model. Just like the Llama-3.1 launch, Mistral Large 2 is open and available for anyone to download, use, and develop with.
Mistral Large 2 is another ‘GPT-4 class’ model and again I ran it through my marketing benchmarks with impressive results. Mistral Large 2 scored 78.0% vs. GPT4o’s 78.1%, so is on a par with OpenAI’s most capable model when it comes to marketing knowledge. It's great to see a European AI company competing with the large US frontier models!
We've now got two large open models, one from the US and one from the EU that are 'GPT-4 class'. That means that the open AI community are only lagging behind the frontier by about a year at this point.
OpenAI Introduces SearchGPT
Another launch this week, was that of OpenAI’s SearchGPT prototype - their long-rumoured ‘search engine’. SearchGPT is currently in limited testing (I haven’t been able to get access yet) and you can see how it works in the video above.
The idea with SearchGPT is threefold:
When it has been integrated into ChatGPT it will give it access to up-to-date knowledge and help solve the knowledge lag problem that we’ve had with many GenAI models (GPT-4o is currently trained on knowledge up to Oct 2023)
It will help to solve the hallucination problem with more grounded answers to questions and direct sources attributed and linked so users can more easily fact-check.
It presents ChatGPT as a more realistic alternative to traditional search platforms (and Google’s AI search summaries) , giving users more reasons to use the platform, and a better search experience that presents answers, instead of a list of links.
SearchGPT is the result of OpenAI’s partnerships with different publishers, and the web crawlers that they’ve been building out over the last 12 months. It also includes tools for publishers to manage how they appear in answers. So for example, publishers can allow their content to be included in search results but not in OpenAI’s training data.
This is really important as it gives a clear path for how publishers, and brands, can control how their content is surfaced in OpenAI’s models. To appear in SearchGPT’s results you need to include their OAI-SearchBot crawler on your website and it can take ~24 hours for OpenAI’s systems to update.
If you’re interested in testing out SearchGPT you can sign up for the waitlist here.
Google makes its Gemini chatbot faster and more widely available
Not to be left out of this week’s GenAI news, Google also announced updated to the free tier of Gemini. Free users of Gemini now have access to Gemini-1.5 Flash which is available in 40 languages and around 230 countries. Gemini-1.5 Flash delivers higher quality answers and is faster to use. It’s also cheaper for Google to run, which is a bonus for them!
Cadbury launches MyCadburyEra to celebrate 200 years
This is a really nice campaign from Cadbury to celebrate 200 years. The microsite allows you to choose which era of Cadbury advertising you’d like to insert yourself into, you can then upload a picture of yourself, and let GenAI do the rest!
DeepMind hits milestone in solving maths problems — AI’s next grand challenge
A new model from Google DeepMin, AlphaProof, is on the verge of besting the world’s top students at solving maths problems. AlphaProof solved four of the six problems given to school students at the 2024 International Mathematical Olympiad (IMO) in the UK and gave rigorous step-by-step proofs that earned a score of 28/42 (67%).
This is the first time any AI system has been able to achieve medal-level performance at the IMO and is an important milestone in developing systems that can plan, reason, and help researchers solve maths theorem. I expect we will see some of the techniques learned with AlphaProof start to show up in Gemini models next year that start behaving more like ‘agents’ by being better able to plan, reason, problem-solve, and perform tasks for users.
AI Ethics News
Ireland’s datacentres overtake electricity use of all homes combined, figures show
Google AI slashes computer power needed for weather forecasts
Video game performers will go on strike over artificial intelligence concerns
'Model collapse': Scientists warn against letting AI eat its own tail
Autonomous AI workers that talk to each other will arrive in 2025, Capgemini predicts
OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole
Long Reads
Mark Zuckerberg - Open Source AI is the Path Forward
Sam Altman - Who will control the future of AI?
Yoshua Bengio - Reasoning through arguments against taking AI safety seriously
Wired - AI is Already Taking Jobs in the Video Game Industry
One Useful Thing - Confronting Impossible Futures
“The future is already here, it’s just not evenly distributed.“
William Gibson