Google has released Gemini 3.5 Live Translate, a streaming speech-to-speech audio model that covers more than 70 languages ​​in Meet, Translate, and the Live API.

by ai-intensify
0 comments
Google has released Gemini 3.5 Live Translate, a streaming speech-to-speech audio model that covers more than 70 languages ​​in Meet, Translate, and the Live API.

Google has just announced Gemini 3.5 Live Translate. This is their latest audio model for live speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The model automatically detects over 70 languages ​​and generates translated speech. This preserves the speaker’s intonation, speed, and pitch in the output. Turn-by-turn systems wait for the speaker to finish before responding. Gemini 3.5 Live Translate instead generates speech continuously. This strikes a balance between waiting for context and translating immediately. More context improves quality. Fast output keeps translations in sync with the speaker. The results lag behind the speaker by a few seconds during the entire session.

Gemini 3.5 Live Translate

Gemini 3.5 Live Translation is a single audio model (gemini-3.5-live-translate-preview), no chat assistant. It processes speech as an audio stream rather than as an entire sentence. It handles multilingual input without manually configuring settings. Its noise robustness allows applications to run in fast, unpredictable environments.

The model operates on three surfaces. Developers get it in public preview through the Gemini Live API and Google AI Studio. Enterprises will get a private preview in Google Meet starting this month. Everyone else gets it through the Google Translate app on Android and iOS.

How does continuous streaming work?

The design difference matters for building real-time features. A conversational live agent uses turn-based interactions. It depends on pause, intent detection and interruption handling. Live Translation uses continuous stream processing instead. It translates as the speaker talks, without waiting for the translation to finish.

To maintain strict real-time latency limits, the translation path only accepts audio input. Text input is not supported in translate mode. The model also skips tool usage and system instructions in this mode. This gives it a focused translation pipeline rather than a general agent.

Build with Live API

Developers configure translation inside a live API session setup. you set a translationConfig block within generationConfig. targetLanguageCode The field takes a BCP-47 code, like "pl" Or "es". BCP-47 is the standard format for language tags such as en Or pt-BR. this is the default "en". echoTargetLanguage Boolean controls the input that is already in the target language. When? trueThe model echoes that speech. When? falseIt remains silent. You can also enable inputAudioTranscription And outputAudioTranscription For lesson transcripts.

Audio formats are fixed. The input is raw 16-bit PCM at 16kHz, mono, little-endian. The output is raw 16-bit PCM at 24kHz, mono, little-endian. PCM is uncompressed raw audio. You send the audio in chunks of 100 ms. For client-side apps, short-lived tokens v1alpha Endpoints avoid exposing your API key.

Dimensionslive agentlive translation
role modelAssistant that listens, reasons and actsInterpreter/Real Time Translator Pipeline
interactionTurn-based, with obstacle handlingContinuous stream processing, no turning
toolFunction Calling, Google Search, InstructionsTranslation only, no tools or instructions
inputText, Audio, Video and ImageAudio only for tight latency
layoutgeneration, speech, tools, instructionstargetLanguageCode And echoTargetLanguage

Example

The model targets live interpretation in multiple settings. Google lists multilingual calls, meetings, texts, and broadcasts. Developer platforms ease the integration work for real-time media. Agora, FishJam, LiveKit, Pipecat, and Vision Agent already use the Live API. These platforms handle complex real-time media streaming infrastructure. This allows developers to focus on user experience.

Google’s example app demonstrates dubbing and simultaneous multilingual translation. Grab is testing the model for driver-and-passenger communication on pickups. Grab users make more than 10 million voice calls per month. CJ ENM, LiveKit and others gave positive feedback on quality, accuracy and low latency.

How it changes Google Meet and Translate

According to Google’s official release, Google Meet will soon use Live Translate 3.5 for speech translation. The table shows before and after information for the meet.

Capacitylast meeting3.5 with live translation
Languages570+
combination per meetingFrom and to English only2000+ combinations
accessexisting interfaceUpdated interface for quick access

The Meet update is in private preview for select Business Workspace customers this month. A broader rollout will take place later this year. In the Translate app, the live translation feature works with any connected headphones. It reflects the speaker’s voice in over 70 languages. Android also gets a listening mode. You hold the phone near your ear like a regular call. The translated audio then streams through the earpiece, without being heard by others.

key takeaways

  • Gemini 3.5 Live Translate is Google’s latest audio model for live speech-to-speech translation in 70+ languages.
  • It streams continuously instead of alternating, lagging behind the speaker by a few seconds.
  • Developers can configure it using the live API targetLanguageCode And echoTargetLanguage; Audio only, 16kHz in, 24kHz out.
  • It has been rolled out on Gemini Live API, Google Meet (5→70+ languages) and Translate app.
  • All generated audio has an inconspicuous SynthID watermark for traceability.

check it out model card And technical details. Also, feel free to follow us Twitter And don’t forget to join us 150k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Do you need to partner with us to promote your GitHub repo or Hugging Face page or product release or webinar, etc? join us


Related Articles

Leave a Comment