此页面由 Cloud Translation API 翻译。

使用 Gemini API 分析文档（例如 PDF 文件）

您可以让 Gemini 模型分析您以内嵌（base64 编码）方式或通过网址提供的文档文件（例如 PDF 和纯文本文件）。使用 Firebase AI Logic 时，您可以直接从应用中发出此请求。

借助此功能，您可以执行以下操作：

分析文档中的图表和表格
以结构化输出格式提取信息
回答有关文档中视觉和文本内容的问题
生成文档摘要
转写文档内容（例如，转写为 HTML），同时保留布局和格式，以便在下游应用（例如在 RAG 流水线中）中使用

跳转到代码示例跳转到流式响应的代码

如需了解处理文档（例如 PDF 文件）的其他选项，请参阅其他指南
生成结构化输出多轮对话

准备工作

点击您的 Gemini API 提供商，以查看此页面上特定于提供商的内容和代码。

如果您尚未完成入门指南，请先完成该指南。该指南介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、为所选的 Gemini API 提供程序初始化后端服务，以及创建 GenerativeModel 实例。

如需测试和迭代提示，甚至获取生成的代码段，我们建议使用 Google AI Studio。

需要 PDF 示例文件？

您可以使用此公开提供的文件（MIME 类型为 application/pdf）（查看或下载文件）。 https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf

根据 PDF 文件（采用 base64 编码）生成文本

在试用此示例之前，请完成本指南的准备工作部分，以设置您的项目和应用。
在该部分中，您还需要点击所选Gemini API提供商对应的按钮，以便在此页面上看到特定于提供商的内容。

您可以向 Gemini 模型提供文本和 PDF 文件，并提供每个输入文件的 mimeType 和文件本身，让模型生成文本。请参阅本页后文，了解输入文件的要求和建议。

Swift

您可以调用 generateContent()，根据文本和 PDF 的多模态输入生成文本。


import FirebaseAI

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.5-flash")


// Provide the PDF as `Data` with the appropriate MIME type
let pdf = try InlineDataPart(data: Data(contentsOf: pdfURL), mimeType: "application/pdf")

// Provide a text prompt to include with the PDF file
let prompt = "Summarize the important results in this report."

// To generate text output, call `generateContent` with the PDF file and text prompt
let response = try await model.generateContent(pdf, prompt)

// Print the generated text, handling the case where it might be nil
print(response.text ?? "No text in response.")

Kotlin

您可以调用 generateContent()，根据文本和 PDF 的多模态输入生成文本。

^{对于 Kotlin，此 SDK 中的方法是挂起函数，需要从协程范围调用。}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.5-flash")


val contentResolver = applicationContext.contentResolver

// Provide the URI for the PDF file you want to send to the model
val inputStream = contentResolver.openInputStream(pdfUri)

if (inputStream != null) {  // Check if the PDF file loaded successfully
    inputStream.use { stream ->
        // Provide a prompt that includes the PDF file specified above and text
        val prompt = content {
            inlineData(
                bytes = stream.readBytes(),
                mimeType = "application/pdf" // Specify the appropriate PDF file MIME type
            )
            text("Summarize the important results in this report.")
        }

        // To generate text output, call `generateContent` with the prompt
        val response = generativeModel.generateContent(prompt)

        // Log the generated text, handling the case where it might be null
        Log.d(TAG, response.text ?: "")
    }
} else {
    Log.e(TAG, "Error getting input stream for file.")
    // Handle the error appropriately
}

Java

您可以调用 generateContent()，根据文本和 PDF 的多模态输入生成文本。

^{对于 Java，此 SDK 中的方法会返回 ListenableFuture。}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.5-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


ContentResolver resolver = getApplicationContext().getContentResolver();

// Provide the URI for the PDF file you want to send to the model
try (InputStream stream = resolver.openInputStream(pdfUri)) {
    if (stream != null) {
        byte[] audioBytes = stream.readAllBytes();
        stream.close();

        // Provide a prompt that includes the PDF file specified above and text
        Content prompt = new Content.Builder()
              .addInlineData(audioBytes, "application/pdf")  // Specify the appropriate PDF file MIME type
              .addText("Summarize the important results in this report.")
              .build();

        // To generate text output, call `generateContent` with the prompt
        ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
        Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
            @Override
            public void onSuccess(GenerateContentResponse result) {
                String text = result.getText();
                Log.d(TAG, (text == null) ? "" : text);
            }
            @Override
            public void onFailure(Throwable t) {
                Log.e(TAG, "Failed to generate a response", t);
            }
        }, executor);
    } else {
        Log.e(TAG, "Error getting input stream for file.");
        // Handle the error appropriately
    }
} catch (IOException e) {
    Log.e(TAG, "Failed to read the pdf file", e);
} catch (URISyntaxException e) {
    Log.e(TAG, "Invalid pdf file", e);
}

Web

您可以调用 generateContent()，根据文本和 PDF 的多模态输入生成文本。


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.5-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(','));
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the PDF file
  const prompt = "Summarize the important results in this report.";

  // Prepare PDF file for input
  const fileInputEl = document.querySelector("input[type=file]");
  const pdfPart = await fileToGenerativePart(fileInputEl.files);

  // To generate text output, call `generateContent` with the text and PDF file
  const result = await model.generateContent([prompt, pdfPart]);

  // Log the generated text, handling the case where it might be undefined
  console.log(result.response.text() ?? "No text in response.");
}

run();

Dart

您可以调用 generateContent()，根据文本和 PDF 的多模态输入生成文本。


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.5-flash');


// Provide a text prompt to include with the PDF file
final prompt = TextPart("Summarize the important results in this report.");

// Prepare the PDF file for input
final doc = await File('document0.pdf').readAsBytes();

// Provide the PDF file as `Data` with the appropriate PDF file MIME type
final docPart = InlineDataPart('application/pdf', doc);

// To generate text output, call `generateContent` with the text and PDF file
final response = await model.generateContent([
  Content.multi([prompt,docPart])
]);

// Print the generated text
print(response.text);

Unity

您可以调用 GenerateContentAsync()，根据文本和 PDF 的多模态输入生成文本。


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.5-flash");


// Provide a text prompt to include with the PDF file
var prompt = ModelContent.Text("Summarize the important results in this report.");

// Provide the PDF file as `data` with the appropriate PDF file MIME type
var doc = ModelContent.InlineData("application/pdf",
      System.IO.File.ReadAllBytes(System.IO.Path.Combine(
        UnityEngine.Application.streamingAssetsPath, "document0.pdf")));

// To generate text output, call `GenerateContentAsync` with the text and PDF file
var response = await model.GenerateContentAsync(new [] { prompt, doc });

// Print the generated text
UnityEngine.Debug.Log(response.Text ?? "No text in response.");

了解如何选择适合您的应用场景和应用的模型。

逐字逐句给出回答

在试用此示例之前，请完成本指南的准备工作部分，以设置您的项目和应用。
在该部分中，您还需要点击所选Gemini API提供商对应的按钮，以便在此页面上看到特定于提供商的内容。

您可以不等待模型生成完整结果，而是使用流式传输来处理部分结果，从而实现更快的互动。如需以流式传输方式获取回答，请调用 generateContentStream。

查看示例：从 PDF 文件中流式传输生成的文本

Swift

您可以调用 generateContentStream()，从文本和 PDF 的多模态输入中流式传输生成的文本。


import FirebaseAI

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.5-flash")


// Provide the PDF as `Data` with the appropriate MIME type
let pdf = try InlineDataPart(data: Data(contentsOf: pdfURL), mimeType: "application/pdf")

// Provide a text prompt to include with the PDF file
let prompt = "Summarize the important results in this report."

// To stream generated text output, call `generateContentStream` with the PDF file and text prompt
let contentStream = try model.generateContentStream(pdf, prompt)

// Print the generated text, handling the case where it might be nil
for try await chunk in contentStream {
  if let text = chunk.text {
    print(text)
  }
}

Kotlin

您可以调用 generateContentStream()，从文本和 PDF 的多模态输入中流式传输生成的文本。

^{对于 Kotlin，此 SDK 中的方法是挂起函数，需要从协程范围调用。}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.5-flash")


val contentResolver = applicationContext.contentResolver

// Provide the URI for the PDF you want to send to the model
val inputStream = contentResolver.openInputStream(pdfUri)

if (inputStream != null) {  // Check if the PDF file loaded successfully
    inputStream.use { stream ->
        // Provide a prompt that includes the PDF file specified above and text
        val prompt = content {
            inlineData(
                bytes = stream.readBytes(),
                mimeType = "application/pdf" // Specify the appropriate PDF file MIME type
            )
            text("Summarize the important results in this report.")
        }

        // To stream generated text output, call `generateContentStream` with the prompt
        var fullResponse = ""
        generativeModel.generateContentStream(prompt).collect { chunk ->
            // Log the generated text, handling the case where it might be null
            val chunkText = chunk.text ?: ""
            Log.d(TAG, chunkText)
            fullResponse += chunkText
        }
    }
} else {
    Log.e(TAG, "Error getting input stream for file.")
    // Handle the error appropriately
}

Java

您可以调用 generateContentStream()，从文本和 PDF 的多模态输入中流式传输生成的文本。

^{对于 Java，此 SDK 中的流式传输方法会返回 Reactive Streams 库中的 Publisher 类型。}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.5-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


ContentResolver resolver = getApplicationContext().getContentResolver();

// Provide the URI for the PDF file you want to send to the model
try (InputStream stream = resolver.openInputStream(pdfUri)) {
    if (stream != null) {
        byte[] audioBytes = stream.readAllBytes();
        stream.close();

        // Provide a prompt that includes the PDF file specified above and text
        Content prompt = new Content.Builder()
              .addInlineData(audioBytes, "application/pdf")  // Specify the appropriate PDF file MIME type
              .addText("Summarize the important results in this report.")
              .build();

        // To stream generated text output, call `generateContentStream` with the prompt
        Publisher<GenerateContentResponse> streamingResponse =
                model.generateContentStream(prompt);

        StringBuilder fullResponse = new StringBuilder();

        streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
            @Override
            public void onNext(GenerateContentResponse generateContentResponse) {
                String chunk = generateContentResponse.getText();
                String text = (chunk == null) ? "" : chunk;
                Log.d(TAG, text);
                fullResponse.append(text);
            }

            @Override
            public void onComplete() {
                Log.d(TAG, fullResponse.toString());
            }

            @Override
            public void onError(Throwable t) {
                Log.e(TAG, "Failed to generate a response", t);
            }

            @Override
            public void onSubscribe(Subscription s) {
            }
         });
    } else {
        Log.e(TAG, "Error getting input stream for file.");
        // Handle the error appropriately
    }
} catch (IOException e) {
    Log.e(TAG, "Failed to read the pdf file", e);
} catch (URISyntaxException e) {
    Log.e(TAG, "Invalid pdf file", e);
}

Web

您可以调用 generateContentStream()，从文本和 PDF 的多模态输入中流式传输生成的文本。


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.5-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(','));
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the PDF file
  const prompt = "Summarize the important results in this report.";

  // Prepare PDF file for input
  const fileInputEl = document.querySelector("input[type=file]");
  const pdfPart = await fileToGenerativePart(fileInputEl.files);

  // To stream generated text output, call `generateContentStream` with the text and PDF file
  const result = await model.generateContentStream([prompt, pdfPart]);

  // Log the generated text
  for await (const chunk of result.stream) {
    const chunkText = chunk.text();
    console.log(chunkText);
  }
}

run();

Dart

您可以调用 generateContentStream()，从文本和 PDF 的多模态输入中流式传输生成的文本。


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.5-flash');


// Provide a text prompt to include with the PDF file
final prompt = TextPart("Summarize the important results in this report.");

// Prepare the PDF file for input
final doc = await File('document0.pdf').readAsBytes();

// Provide the PDF file as `Data` with the appropriate PDF file MIME type
final docPart = InlineDataPart('application/pdf', doc);

// To generate text output, call `generateContentStream` with the text and PDF file
final response = await model.generateContentStream([
  Content.multi([prompt,docPart])
]);

// Print the generated text
await for (final chunk in response) {
  print(chunk.text);
}

Unity

您可以调用 GenerateContentStreamAsync()，从文本和 PDF 的多模态输入中流式传输生成的文本。


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.5-flash");


// Provide a text prompt to include with the PDF file
var prompt = ModelContent.Text("Summarize the important results in this report.");

// Provide the PDF file as `data` with the appropriate PDF file MIME type
var doc = ModelContent.InlineData("application/pdf",
      System.IO.File.ReadAllBytes(System.IO.Path.Combine(
        UnityEngine.Application.streamingAssetsPath, "document0.pdf")));

// To stream generated text output, call `GenerateContentStreamAsync` with the text and PDF file
var responseStream = model.GenerateContentStreamAsync(new [] { prompt, doc });

// Print the generated text
await foreach (var response in responseStream) {
  if (!string.IsNullOrWhiteSpace(response.Text)) {
    UnityEngine.Debug.Log(response.Text);
  }
}

了解如何选择适合您的应用场景和应用的模型。

输入文档的要求和建议

请注意，以内嵌数据形式提供的文件在传输过程中会编码为 base64，这会增加请求的大小。如果请求过大，您会收到 HTTP 413 错误。

如需详细了解以下信息，请参阅“Vertex AI Gemini API 的支持输入文件和要求”：

在请求中提供文件的不同方式（内嵌或使用文件的网址/URI）
文档文件的要求和最佳实践

支持的视频 MIME 类型

Gemini 多模态模型支持以下文档 MIME 类型：

MIME 类型文档	Gemini 2.0 Flash	Gemini 2.0 Flash‑Lite
PDF - `application/pdf`
文本 - `text/plain`

每个请求的限制

PDF 文件被视为图片，因此 PDF 文件的单页被视为一张图片。提示中允许的页数取决于模型可以支持的图片数量：

Gemini 2.0 Flash 和 Gemini 2.0 Flash‑Lite：
- 每个请求的文件数量上限：3,000
- 每个文件的页数上限：1,000
- 每个文件的大小上限：50 MB

您还可以做些什么？

了解如何在向模型发送长提示之前计算 token 数。
设置 Cloud Storage for Firebase，以便您可以在多模态请求中包含大型文件，并获得更易于管理的解决方案，在提示中提供文件。文件可以包括图片、PDF、视频和音频。
开始考虑为生产做准备（请参阅生产核对清单），包括：
- 设置 Firebase App Check，以保护 Gemini API 免遭未经授权的客户端滥用。
- 集成 Firebase Remote Config 以更新应用中的值（例如模型名称），而无需发布新的应用版本。

试用其他功能

构建多轮对话（聊天）。
根据纯文本提示生成文本。
根据文本提示和多模态提示生成结构化输出（如 JSON）。
根据文本提示生成图片（Gemini 或 Imagen）。
使用函数调用将生成式模型连接到外部系统和信息。

了解如何控制内容生成

了解提示设计，包括最佳实践、策略和示例提示。
配置模型参数，例如温度和输出 token 数上限（对于 Gemini）或宽高比和人物生成（对于 Imagen）。
使用安全设置来调整获得可能被视为有害的回答的可能性。

您还可以尝试使用提示和模型配置，甚至可以使用 Google AI Studio 获取生成的代码段。

详细了解支持的型号

了解适用于各种应用场景的模型及其配额和价格。

就您使用 Firebase AI Logic 的体验提供反馈

如未另行说明，那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可，并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情，请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。

最后更新时间 (UTC)：2025-07-09。