Tools & releases

Google Introduces Gemini 3.5 Live Translate for Real-Time Multimodal Voice Applications

June 10, 2026 5 min read

Curated by Oleksandr Kuzmenko, AI Product EngineerUpdated June 10, 2026Sources cited on every story

AI-assisted · editor-reviewedHow we use AI

Google Introduces Gemini 3.5 Live Translate for Real-Time Multimodal Voice Applications

Google has launched Gemini 3.5 Live Translate, focusing on low-latency, end-to-end voice translation. This release provides a direct API capability for developers looking to build seamless speech-to-speech agents.

Impact: High

Why it matters

You can now bypass separate speech-to-text, translation, and text-to-speech pipelines by leveraging Gemini's native multimodal voice capabilities.

TL;DR

01End-to-end audio modeling cuts latency down to sub-second conversational responses.
02The system retains emotional prosody and natural pauses during real-time speech translation.
03Developers can integrate the model directly via Gemini API SDKs for real-time streaming sockets.

Key facts

Supported Languages: 70+
Google Meet Combinations: 2,000+
Grab Tested Volume: 10M+ monthly calls
Watermarking Standard: SynthID

Continuous Multimodal Voice Translation

Google has launched Gemini 3.5 Live Translate, a model providing near real-time speech-to-speech translation across more than 70 languages. Unlike traditional turn-by-turn voice systems, 3.5 Live Translate continuously streams audio, preserving the original speaker's intonation, pitch, and pacing while staying only a few seconds behind the speaker.

Broad Integration and Partners

The model is available in public preview via the Gemini Live API and Google AI Studio, as well as in private preview for enterprise customers in Google Meet. Key real-time streaming partners include Agora, Fishjam, LiveKit, Pipecat, and Vision Agents. Ride-hailing giant Grab is currently testing the technology to facilitate communications for over 10 million monthly voice calls between drivers and passengers.

Security and New Mobile Features

All audio generated by the model is transparently watermarked using Google’s SynthID technology to secure the content and prevent misinformation. For mobile users, Android is receiving a new "listening mode" which allows users to hold their phone to their ear like a standard call to privately hear the incoming audio translation.

✓ When to use

When developing natural conversational translation apps with sub-second perceived response lag.
When requiring continuous background audio translation for multi-party meetings.

✕ When NOT to use

When offline, local-only execution without internet access is required.
When watermarked output is not allowed by application specifications.

What to do today

Explore the Gemini API documentation for the new live audio streaming endpoints.
Test the model's performance on industry-specific domain terminology to verify translation accuracy.

#Gemini 3.5 Live Translate#Gemini API

ShareShare on X Share on LinkedIn

Google Introduces Gemini 3.5 Live Translate for Real-Time Multimodal Voice Applications

June 10, 2026 5 min read

Curated by Oleksandr Kuzmenko, AI Product EngineerUpdated June 10, 2026Sources cited on every story

AI-assisted · editor-reviewedHow we use AI

Impact: High

Why it matters

You can now bypass separate speech-to-text, translation, and text-to-speech pipelines by leveraging Gemini's native multimodal voice capabilities.

TL;DR

01End-to-end audio modeling cuts latency down to sub-second conversational responses.
02The system retains emotional prosody and natural pauses during real-time speech translation.
03Developers can integrate the model directly via Gemini API SDKs for real-time streaming sockets.

Key facts

Supported Languages: 70+
Google Meet Combinations: 2,000+
Grab Tested Volume: 10M+ monthly calls
Watermarking Standard: SynthID

Continuous Multimodal Voice Translation

Broad Integration and Partners

Security and New Mobile Features

✓ When to use

When developing natural conversational translation apps with sub-second perceived response lag.
When requiring continuous background audio translation for multi-party meetings.

✕ When NOT to use

When offline, local-only execution without internet access is required.
When watermarked output is not allowed by application specifications.

What to do today

Explore the Gemini API documentation for the new live audio streaming endpoints.
Test the model's performance on industry-specific domain terminology to verify translation accuracy.

#Gemini 3.5 Live Translate#Gemini API

Google Introduces Gemini 3.5 Live Translate for Real-Time Multimodal Voice Applications

Continuous Multimodal Voice Translation

Broad Integration and Partners

Security and New Mobile Features

Related stories

Get the morning AI brief

Google Introduces Gemini 3.5 Live Translate for Real-Time Multimodal Voice Applications

Continuous Multimodal Voice Translation

Broad Integration and Partners

Security and New Mobile Features

Related stories

Get the morning AI brief