Google’s Live captions are becoming richer with new AI-driven “Expressive Captions” that convey more than basic language, including sounds and actions. Google is also bringing Gemini 1.5 to Image Q&A in the Lookout app.
Thank you for reading this post, don't forget to subscribe!
Live Caption has been a staple of Google’s Pixel lineup since 2019. The feature allows users to insert captions where there are normally none using the phone’s Tensor SoC and onboard processing. When a voice is heard through a video or other media playing audio, the Pixel phone will pick up on that speech and display it as it hears it. It’s useful for a variety of users, especially those who are deaf/hard of hearing.
Live Captions are getting an overhauled mode for processing audio more dynamically. Google announced that Expressive Captions would allow users to see the nuanced speech and actions in media through Live Captions using AI on-device. That includes decoding tone, volume, and environmental cues. The change will dynamically reflect the way speech is presented.
Google gives a couple of examples of how this will work. When someone yells something, that intensity is translated to captions in all caps. If someone were to yell, the caption would reflect the volume. Google’s expressive captions using AI can also decode vocal bursts, such as sighs and groans, detailing the little sounds in between words. Even ambient sounds are represented to fill in the blacks around speech.
In addition, Google announced that image descriptions can now be read aloud. With that, the company is bringing Gemini 1.5 Pro to the Lookout app – an app that aids the vision-impaired. The Q&A feature, which allows users to ask questions about an image, will now be a little more capable. An image can be described in a more natural voice via the Gemini model and will be capable of giving more surrounding information beyond a simple description.
It’s noted that Google’s expressive AI captions are a part of Live Caption, so there is no restriction to which Pixel devices can utilize it. If Live Caption is available, this upgrade will be reflected. Google does note that the feature will not be compatible with phone calls, though that might change over time.
More on Google:
FTC: We use income earning auto affiliate links. More.
2024-12-05 17:15:00