A week in Generative AI: Canvas, Movie Gen & AI Overviews
News for the week ending 6th October 2024
We’ve had another big week of news from OpenAI with the launch of Canvas, their Dev Day and the confirmation of their latest funding round, valuing the company at $157bn. Meta announced their state-of-the-art text-to-video model Movie Gen, which looks very impressive, Microsoft announced more features for Copilot on Windows PCs, and Google announced that it will start testing adverts in their search AI Overviews.
In ethics news, it was announced that the upcoming French AI Summit will focus on the environmental impact of generative AI, there are reports that OpenAI rushed the release of their latest o1 model, and the US Department of Justice warns that if your AI commits a crime that you’ll do the time.
Another big week for OpenAI
After all of last week’s news around OpenAI, it’s been another eventful week this week. The biggest news was the launch of Canvas, OpenAI’s take on Claude’s Artifacts. Canvas is similar in many ways to Artifacts and is a big improvement in the user interface of ChatGPT, allowing the model to present long form content (either text or code) without interrupting the flow of the conversation.
Where Canvas improves on Artifacts is that the model can edit the existing Canvas instead of having to re-write it every time you ask for a change to be made to the content, which is what Claude does with Artifacts. There are also some other great features which allow users to highlight specific sections for ChatGPT top focus on when asking for edits, users can edit the content themselves directly and there’s a menu of shortcuts to quickly ask ChatGPT to adjust the length of the content, debug any code and quickly perform other actions. Users can also restore previous versions of their content by using the back button.
These are big improvements on top of how Anthropic’s Artifacts work and it greatly improves the ChatGPT user experience. I have found in my own testing that once ChatGPT has created a Canvas it won’t create a new one, so instead of having multiple canvases for different content in a single chat you have to start a new chat every time, which is a bit annoying. Canvas is currently in beta for Plus and Teams users and will be available to all users for free once it exits beta.
OpenAI also had their (now) annual Dev Day this week. They didn’t announce any big new models or anything huge, but did announce lots of great things for developers, including the Realtime API which allows developers to take advantage of the Advanced Voice mode of GPT-4o and create much better interactive voice experiences with their apps. There was also the confirmation of the Fund Raising that has been rumoured for a while. OpenAI have just raised $6.6bn at a total valuation of $157bn, which is a huge amount of money!
Meta launches Movie Gen
We’ve seen lots of different text-to-video models showcased and released since OpenAI announced their video model Sora back in February. There’s Runway Gen3, Dream Machine from Luma Labs and Adobe’s Firefly Video model amongst many others. Meta are now getting in on the text-to-video game with their Movie Gen model which was announced this week.
At this stage, this is just a research release, meaning that the models are not currently available for users or developers. However, from the examples they’ve shared this looks (at least to my eyes) the most polished text-to-video model that I’ve seen. It also comes with many advanced features we haven’t really seen before such as the ability to edit a video with text, product personalised videos with your own face and to add sound effects and soundtracks to the videos.
Movie Gen can generate HD quality videos up to 16 seconds long, which is about as good as you can get with these models right now. Meta says that they are working towards a potential future release, and are collaborating with filmmakers and creators on the model’s development. You can read a lot more about Movie Gen here.
Microsoft Copilot can now read your screen, think deeply, and speak aloud to you
Microsoft is launching new Copilot capabilities for all users on Windows, including a tool that can understand and respond to questions about what’s on your screen.
Copilot Vision has an understanding of what’s in your web browser at any given time and can analyse text and images and answer questions on the content. Data is deleted immediately after a conversation has finished to ensure user privacy and none of the content is kept and used to train future models. Microsoft’s blocking the feature from working on paywalled and “sensitive” content, limiting Copilot Vision to a pre-approved list of websites.
Copilot’s new Think Deeper feature is an attempt to make Microsoft’s assistant more versatile by giving Copilot the ability to reason through more complex problems, which presumably means, although unconfirmed, that it’s powered by a version of OpenAI’s o1. There’s also a new Copilot Voice feature letting users talk to Copilot and have its responses be spoken aloud. Again, this is likely powered by OpenAI’s GPT-4o Advanced Voice features.
I’m also pleased to see that Microsoft is looking to implement some personalisation features similar to the ones I’ve written about recently. When personalisation is turned on, Copilot will use your past interactions with Copilot, as well as your interactions with other Microsoft apps and services to recommend ways to use Copilot. It’s a very small step towards personalisation, but great to see Microsoft moving in that direction.
Google brings ads to AI Overviews as it expands AI's role in search
It’s not surprising to start seeing Google experiment with how they include adverts in their AI overviews. As their search business comes under increasing pressure from GenAI chatbots and new GenAI powered search experiences like Perplexity and OpenAI’s SearchGPT, Google needs to find news ways to sure up their main revenue stream.
Ads in AI Overviews will start rolling out in the US this week alongside new AI-organised search pages. The ads will have be in a ‘sponsored’ section and will appear alongside other, non-sponsored content in the AI summaries. Ads will be drawn from advertisers’ existing Google Shopping and Search campaigns, so there is no adjustment needed for advertisers to take advantage of these new formats.
According to some research from over the summer, AI summaries only appear for around 7% of search queries and it’s unclear whether advertisers can opt out of their ads appearing in the summaries. Given how controversial some of the AI overviews have been, it would be wise for advertisers to approach this new feature cautiously.
AI Ethics News
French AI summit to focus on environmental impact of energy-hungry tech
Microsoft starts paying publishers for content surfaced by Copilot
Long Reads
The New York Times: Behind OpenAI’s Audacious Plan to Make A.I. Flow Like Electricity
The Wall Street Journal: OpenAI’s Complex Path to Becoming a For-Profit Company
“The future is already here, it’s just not evenly distributed.“
William Gibson