A week in Generative AI: Gemini Live, Runway Turbo & Flux
News for the week ending 18th August 2024
The big news this week was the launch of Gemini Live at Googleās #MadeByGoogle ā24 event. Gemini Live is Googleās response to GPT-4oās Advanced Voice mode and is another example of the voice capabilities that the next generation of voice assistants will have. We also saw the launch of Runwayās Gen-3 Turbo model which is bringing us much closer to realtime video generation and Flux, a text-to-image model is getting a lot of praise for its high quality images and the ability for users to run a version of it locally on a well equipped computer.
In ethics news, thereās an article about how Xās new image generating model has no guardrails or safeguards, allowing it to create images on almost anything. OpenAI also announced that they have shut down accounts linked to an Iranian influence operation that was generating content about the US presidential election.
Gemini Live, Googleās answer to ChatGPTās Advanced Voice Mode, launches
The big news of the week were some of the announcements Google made at their #MadeByGoogle ā24 event where they launched the new Pixel 9. As the Verge wrote, AI overshadowed Pixel at the Pixel event, and loads of new AI features were shared:
Pixel screenshots allow users to capture their screens so the information captured is then searchable later
The Gemini assistant on device will be much faster thanks for Gemini 1.5 Flash
You can ask Gemini about whatās on your screen at any given time
There are also lots of other little features like call transcription, AI generated weather summaries and lots of AI photo features
However, the bigger new announcement was Googleās Gemini Live, their answer to OpenAIās GPT-4o Advanced Voice Mode. There is some great coverage of Gemini Live from Joanna Stern at the Wall Street Journal and Maxwell Zeff at TechCrunch. They both describe Gemini Live allowing them to talk WITH their phone, not TO it, which I think is a great summary of how we should think about the next generation of generative AI powered voice assistants.
The next generation of generative AI powered voice assistants will allow us to talk WITH our phones, not just TO them.
The other interesting thing about Gemini Live is that it 100% runs in the cloud, which is very different from Appleās approach with Apple Intelligence, which first and foremost runs on device and then uses the cloud for more complex use cases. This means that (currently) Gemini Live canāt really perform tasks on the Pixel phone like set alarms or timers. Google say theyāre working on ways Gemini Live can control phone functions, but no news on when that might be coming.
Runwayās Gen-3 Alpha Turbo is here and can make AI videos faster than you can type
Runway debuted its third generation video generation model last month, but this week showed off Gen-3ās Turbo model, which it claims is seven times faster and half the cost. Itās the speed of the model that is the really big thing here, with Runwaysās CEO claiming āit now takes me longer to type a sentence than to generate a video.ā
Lag has been a big issue when generating video, and Runway seems to have solved that. Weāre moving very close to the point where we have real-time video generation similar to the real-time image generation we started seeing last year with Stability.aiās SDXL Turbo and Leonardo.aiās Realtime Canvas.
Forget Midjourney ā Flux is the new king of AI image generation
Flux, a text-to-image model created by a new startup called Black Forest Labs has been getting a lot of attention since its launch a few weeks ago. Black Forest Labs was founded by engineers from Stability AI just a few months ago and their first Flux model is available in three versions.
The Pro version (largest, most capable model) is available via API, The Dev version (medium sized model) is an open-weights model that can be used for non-commercial applications and the Schnell model (smallest, fastest model) is small enough to be downloaded to a well-equipped local computer for personal use.
Lots have commentators have been testing the Pro version of Flux and have been very impressed with the quality, in some cases exceeding Midjourney v6.1 that landed in July. The fact that versions of the model are open-weight and can run locally is also a big selling point. Black Forest Labs say theyāre now working on a text-to-video model that will be open-source, branding it āState-of-the-Art Text to Video for all.ā
Purdue's UniT gives robots a more human-like sense of touch
This is a little bit left-field and a little bit technical, but Iāve always been fascinated by how we will give robots a sense of touch. I think this will be a really important feature for robots to gain mass adoption in the real world and to regularly be acting in and amongst the human population. It will also give GenAI models a HUGE new dataset to be trained with, which is one of the reasons I donāt subscribe to the idea that weāre running out of data to train bigger and more sophisticated GenAI models.
AI Ethics News
ChatGPT unexpectedly began speaking in a userās cloned voice during testing
OpenAI shuts down election influence operation using ChatGPT
Research AI model unexpectedly modified its own code to extend runtime
Long Reads
One Useful Thing - Change blindness
Stratechery - Integration and Android
MIT News - LLMs develop their own understanding of reality as their language abilities improve
NYMag - The Future Will Be Brief
āThe future is already here, itās just not evenly distributed.ā
William Gibson





