Gemini Live API 可與 Gemini 模型進行雙向互動,以低延遲的即時語音和視訊方式對話。
Live API 和其特殊模型系列可處理連續的音訊、影片或文字串流,立即提供擬真的口語回應,為使用者打造自然的對話體驗。
本頁說明如何開始使用最常見的功能 (串流音訊輸入和輸出),但 Live API 支援許多不同的功能和設定選項。
Live API 是有狀態的 API,可建立 WebSocket 連線,在用戶端和 Gemini 伺服器之間建立工作階段。詳情請參閱 Live API 參考文件 (Gemini Developer API | Vertex AI Gemini API)。
查看實用資源
Swift - 即將推出!| Android - 快速入門應用程式 | 網頁 - 快速入門應用程式 | Flutter - 快速入門應用程式 | Unity - 即將推出!
在實際部署的應用程式中體驗 Gemini Live API - 透過 Firebase 控制台存取 Flutter AI Playground 應用程式
事前準備
如果尚未完成,請參閱入門指南,瞭解如何設定 Firebase 專案、將應用程式連結至 Firebase、新增 SDK、為所選Gemini API供應商初始化後端服務,以及建立 LiveModel 執行個體。
您可以在 Google AI Studio 或 Vertex AI Studio 中,使用提示和 Live API 設計原型。
支援這項功能的模型
Gemini 2.5 Flash Live 模型是支援 Gemini Live API 的原生音訊模型。雖然模型名稱會因 Gemini API 供應商而異,但模型行為和功能相同。
Gemini Developer API
gemini-2.5-flash-native-audio-preview-12-2025gemini-2.5-flash-native-audio-preview-09-2025
雖然這些是預先發布模型,但仍可在 Gemini Developer API 的「免費層級」中使用。
Vertex AI Gemini API
gemini-live-2.5-flash-native-audio(2025 年 12 月發布)gemini-live-2.5-flash-preview-native-audio-09-2025
使用 Vertex AI Gemini API 時,Live API 模型不支援
global位置。
串流音訊輸入和輸出
|
按一下 Gemini API 供應商,即可在這個頁面查看供應商專屬內容和程式碼。 |
以下範例顯示如何基本實作,以傳送串流音訊輸入,並接收串流音訊輸出。
如要瞭解 Live API 的其他選項和功能,請參閱本頁稍後的「你還能做什麼?」一節。
Swift
如要使用 Live API,請建立 LiveModel 例項,並將回應模式設為 audio。
import FirebaseAILogic
// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
generationConfig: LiveGenerationConfig(
responseModalities: [.audio]
)
)
do {
let session = try await liveModel.connect()
// Load the audio file, or tap a microphone
guard let audioFile = NSDataAsset(name: "audio.pcm") else {
fatalError("Failed to load audio file")
}
// Provide the audio data
await session.sendAudioRealtime(audioFile.data)
var outputText = ""
for try await message in session.responses {
if case let .content(content) = message.payload {
content.modelTurn?.parts.forEach { part in
if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
// Handle 16bit pcm audio data at 24khz
playAudio(part.data)
}
}
// Optional: if you don't require to send more requests.
if content.isTurnComplete {
await session.close()
}
}
}
} catch {
fatalError(error.localizedDescription)
}
Kotlin
如要使用 Live API,請建立 LiveModel 例項,並將回應模式設為 AUDIO。
// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO
}
)
val session = liveModel.connect()
// This is the recommended approach.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()
Java
如要使用 Live API,請建立 LiveModel 例項,並將回應模式設為 AUDIO。
ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
"gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
new LiveGenerationConfig.Builder()
.setResponseModality(ResponseModality.AUDIO)
.build()
);
LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture = liveModel.connect();
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
@Override
public void onSuccess(LiveSession ses) {
LiveSessionFutures session = LiveSessionFutures.from(ses);
session.startAudioConversation();
}
@Override
public void onFailure(Throwable t) {
// Handle exceptions
}
}, executor);
Web
如要使用 Live API,請建立 LiveGenerativeModel 例項,並將回應模式設為 AUDIO。
import { initializeApp } from "firebase/app";
import { getAI, getLiveGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
// ...
};
// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `LiveGenerativeModel` instance with a model that supports the Live API
const liveModel = getLiveGenerativeModel(ai, {
model: "gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
generationConfig: {
responseModalities: [ResponseModality.AUDIO],
},
});
const session = await liveModel.connect();
// Start the audio conversation
const audioConversationController = await startAudioConversation(session);
// ... Later, to stop the audio conversation
// await audioConversationController.stop()
Dart
如要使用 Live API,請建立 LiveGenerativeModel 例項,並將回應模式設為 audio。
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';
late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Gemini Developer API backend service
// Create a `liveGenerativeModel` instance with a model that supports the Live API
final liveModel = FirebaseAI.googleAI().liveGenerativeModel(
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
// Configure the model to respond with audio
liveGenerationConfig: LiveGenerationConfig(
responseModalities: [ResponseModalities.audio],
),
);
_session = await liveModel.connect();
final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);
// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
// Process the received message
}
Unity
如要使用 Live API,請建立 LiveModel 例項,並將回應模式設為 Audio。
using Firebase;
using Firebase.AI;
async Task SendTextReceiveAudio() {
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with a model that supports the Live API
var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Audio })
);
LiveSession session = await liveModel.ConnectAsync();
// Start a coroutine to send audio from the Microphone
var recordingCoroutine = StartCoroutine(SendAudio(session));
// Start receiving the response
await ReceiveAudio(session);
}
IEnumerator SendAudio(LiveSession liveSession) {
string microphoneDeviceName = null;
int recordingFrequency = 16000;
int recordingBufferSeconds = 2;
var recordingClip = Microphone.Start(microphoneDeviceName, true,
recordingBufferSeconds, recordingFrequency);
int lastSamplePosition = 0;
while (true) {
if (!Microphone.IsRecording(microphoneDeviceName)) {
yield break;
}
int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);
if (currentSamplePosition != lastSamplePosition) {
// The Microphone uses a circular buffer, so we need to check if the
// current position wrapped around to the beginning, and handle it
// accordingly.
int sampleCount;
if (currentSamplePosition > lastSamplePosition) {
sampleCount = currentSamplePosition - lastSamplePosition;
} else {
sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
}
if (sampleCount > 0) {
// Get the audio chunk
float[] samples = new float[sampleCount];
recordingClip.GetData(samples, lastSamplePosition);
// Send the data, discarding the resulting Task to avoid the warning
_ = liveSession.SendAudioAsync(samples);
lastSamplePosition = currentSamplePosition;
}
}
// Wait for a short delay before reading the next sample from the Microphone
const float MicrophoneReadDelay = 0.5f;
yield return new WaitForSeconds(MicrophoneReadDelay);
}
}
Queue audioBuffer = new();
async Task ReceiveAudio(LiveSession liveSession) {
int sampleRate = 24000;
int channelCount = 1;
// Create a looping AudioClip to fill with the received audio data
int bufferSamples = (int)(sampleRate * channelCount);
AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
sampleRate, true, OnAudioRead);
// Attach the clip to an AudioSource and start playing it
AudioSource audioSource = GetComponent();
audioSource.clip = clip;
audioSource.loop = true;
audioSource.Play();
// Start receiving the response
await foreach (var message in liveSession.ReceiveAsync()) {
// Process the received message
foreach (float[] pcmData in message.AudioAsFloat) {
lock (audioBuffer) {
foreach (float sample in pcmData) {
audioBuffer.Enqueue(sample);
}
}
}
}
}
// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
int samplesToProvide = data.Length;
int samplesProvided = 0;
lock(audioBuffer) {
while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
data[samplesProvided] = audioBuffer.Dequeue();
samplesProvided++;
}
}
while (samplesProvided < samplesToProvide) {
data[samplesProvided] = 0.0f;
samplesProvided++;
}
}
價格和權杖計數
如要瞭解 Live API 模型的價格資訊,請參閱所選Gemini API供應商的說明文件: Gemini Developer API | Vertex AI Gemini API。
無論您使用哪家 Gemini API 提供者,Live API 不支援 Count Tokens API。
你還能做些什麼?
請參閱 Live API 的完整功能,例如串流各種輸入模式 (音訊、文字或影片 + 音訊)。
使用各種設定選項自訂實作方式,例如新增轉錄內容或設定回覆語音。
讓模型存取函式呼叫和 Google 搜尋等工具,進一步提升實作成效。我們即將推出官方文件,說明如何搭配 Live API 使用工具!
瞭解使用 Live API 的限制和規格,例如工作階段長度、速率限制、支援的語言等。