使用 Gemini 生成图片


您可以使用纯文本提示和文本与图片提示,让 Gemini 模型生成图片和修改图片。使用 Firebase AI Logic 时,您可以直接从应用发出此请求。

借助此功能,您可以执行以下操作:

  • 通过自然语言对话迭代生成图片,调整图片的同时保持一致性和上下文。

  • 生成具有高质量文本渲染的图片,包括长文本字符串。

  • 生成交织的文本-图片输出。例如,一次转弯即可看到包含文本和图片的博文。以前,这需要将多个模型串联起来。

  • 使用 Gemini 的世界知识和推理能力生成图片。

您可以在本页下文中找到受支持的模态和功能的完整列表(以及示例提示)。

对于图片输出,您必须使用 Gemini 模型 gemini-2.0-flash-preview-image-generation,并在模型配置中添加 responseModalities: ["TEXT", "IMAGE"]

跳转到文本转图片的代码 跳转到交织文本和图片的代码

跳转到图片编辑代码 跳转到迭代图片编辑代码


查看其他指南,了解处理图片的其他选项
分析图片 在设备上分析图片 生成结构化输出

GeminiImagen 模型之间进行选择

Firebase AI Logic SDK 支持使用 Gemini 模型或 Imagen 模型生成图片。对于大多数用例,请先使用 Gemini,然后在图像质量至关重要的专门任务中选择 Imagen

请注意,Firebase AI Logic SDK 尚不支持使用 Imagen 模型进行图片输入(例如编辑)。因此,如果您想处理输入图片,可以改用 Gemini 模型。

当您需要执行以下操作时,请选择 Gemini

  • 利用世界知识和推理能力生成与上下文相关的图片。
  • 将文字和图片无缝融合。
  • 在长文本序列中嵌入准确的视觉内容。
  • 以对话方式编辑图片,同时保留上下文。

在以下情况下,请选择 Imagen

  • 优先考虑画质、写实效果、艺术细节或特定风格(例如印象派或动漫)。
  • 明确指定生成图片的宽高比或格式。

准备工作

点击您的 Gemini API 提供商,在本页面上查看特定于提供商的内容和代码。

如果您尚未完成入门指南,请先完成该指南。其中介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、为所选的 Gemini API 提供程序初始化后端服务,以及创建 GenerativeModel 实例。

如需测试和迭代提示,甚至获取生成的代码段,我们建议使用 Google AI Studio

支持此功能的模型

只有 gemini-2.0-flash-preview-image-generation(而非 gemini-2.0-flash)支持从 Gemini 输出图片。

请注意,这些 SDK 还支持使用 Imagen 模型生成图片

生成和修改图片

您可以使用 Gemini 模型生成和编辑图片。

生成图片(仅限文本输入)

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在此部分中,您还需要点击所选 Gemini API 提供方的按钮,以便在此页面上看到特定于该提供方的相关内容

您可以通过文本提示来让 Gemini 模型生成图片。

请务必创建 GenerativeModel 实例,在模型配置中添加 responseModalities: ["TEXT", "IMAGE"],并调用 generateContent

Swift


import FirebaseAI

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
  modelName: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: GenerationConfig(responseModalities: [.text, .image])
)

// Provide a text prompt instructing the model to generate an image
let prompt = "Generate an image of the Eiffel tower with fireworks in the background."

// To generate an image, call `generateContent` with the text input
let response = try await model.generateContent(prompt)

// Handle the generated image
guard let inlineDataPart = response.inlineDataParts.first else {
  fatalError("No image data in response.")
}
guard let uiImage = UIImage(data: inlineDataPart.data) else {
  fatalError("Failed to convert data to UIImage.")
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
    modelName = "gemini-2.0-flash-preview-image-generation",
    // Configure the model to respond with text and images
    generationConfig = generationConfig {
responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) }
)

// Provide a text prompt instructing the model to generate an image
val prompt = "Generate an image of the Eiffel tower with fireworks in the background."

// To generate image output, call `generateContent` with the text input
val generatedImageAsBitmap = model.generateContent(prompt)
    // Handle the generated image
    .candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() }

Java


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(
    "gemini-2.0-flash-preview-image-generation",
    // Configure the model to respond with text and images
    new GenerationConfig.Builder()
        .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))
        .build()
);

GenerativeModelFutures model = GenerativeModelFutures.from(ai);

// Provide a text prompt instructing the model to generate an image
Content prompt = new Content.Builder()
        .addText("Generate an image of the Eiffel Tower with fireworks in the background.")
        .build();

// To generate an image, call `generateContent` with the text input
ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) { 
        // iterate over all the parts in the first candidate in the result object
        for (Part part : result.getCandidates().get(0).getContent().getParts()) {
            if (part instanceof ImagePart) {
                ImagePart imagePart = (ImagePart) part;
                // The returned image as a bitmap
                Bitmap generatedImageAsBitmap = imagePart.getImage();
                break;
            }
        }
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, {
  model: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: {
    responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],
  },
});

// Provide a text prompt instructing the model to generate an image
const prompt = 'Generate an image of the Eiffel Tower with fireworks in the background.';

// To generate an image, call `generateContent` with the text input
const result = model.generateContent(prompt);

// Handle the generated image
try {
  const inlineDataParts = result.response.inlineDataParts();
  if (inlineDataParts?.[0]) {
    const image = inlineDataParts[0].inlineData;
    console.log(image.mimeType, image.data);
  }
} catch (err) {
  console.error('Prompt or candidate was blocked:', err);
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
final model = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.0-flash-preview-image-generation',
  // Configure the model to respond with text and images
  generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]),
);

// Provide a text prompt instructing the model to generate an image
final prompt = [Content.text('Generate an image of the Eiffel Tower with fireworks in the background.')];

// To generate an image, call `generateContent` with the text input
final response = await model.generateContent(prompt);
if (response.inlineDataParts.isNotEmpty) {
  final imageBytes = response.inlineDataParts[0].bytes;
  // Process the image
} else {
  // Handle the case where no images were generated
  print('Error: No images were generated.');
}

Unity


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(
  modelName: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: new GenerationConfig(
    responseModalities: new[] { ResponseModality.Text, ResponseModality.Image })
);

// Provide a text prompt instructing the model to generate an image
var prompt = "Generate an image of the Eiffel Tower with fireworks in the background.";

// To generate an image, call `GenerateContentAsync` with the text input
var response = await model.GenerateContentAsync(prompt);

var text = response.Text;
if (!string.IsNullOrWhiteSpace(text)) {
  // Do something with the text
}

// Handle the generated image
var imageParts = response.Candidates.First().Content.Parts
                         .OfType<ModelContent.InlineDataPart>()
                         .Where(part => part.MimeType == "image/png");
foreach (var imagePart in imageParts) {
  // Load the Image into a Unity Texture2D object
  UnityEngine.Texture2D texture2D = new(2, 2);
  if (texture2D.LoadImage(imagePart.Data.ToArray())) {
    // Do something with the image
  }
}

了解如何选择适合您的应用场景和应用的模型

生成图文交织的内容

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在此部分中,您还需要点击所选 Gemini API 提供方的按钮,以便在此页面上看到特定于该提供方的相关内容

您可以让 Gemini 模型根据其文本回答生成交错图片。例如,您可以生成生成的食谱的每个步骤可能的样子以及相应步骤的说明的图片,而无需向模型或不同的模型发出单独的请求。

请务必创建 GenerativeModel 实例,在模型配置中添加 responseModalities: ["TEXT", "IMAGE"],并调用 generateContent

Swift


import FirebaseAI

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
  modelName: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: GenerationConfig(responseModalities: [.text, .image])
)

// Provide a text prompt instructing the model to generate interleaved text and images
let prompt = """
Generate an illustrated recipe for a paella.
Create images to go alongside the text as you generate the recipe
"""

// To generate interleaved text and images, call `generateContent` with the text input
let response = try await model.generateContent(prompt)

// Handle the generated text and image
guard let candidate = response.candidates.first else {
  fatalError("No candidates in response.")
}
for part in candidate.content.parts {
  switch part {
  case let textPart as TextPart:
    // Do something with the generated text
    let text = textPart.text
  case let inlineDataPart as InlineDataPart:
    // Do something with the generated image
    guard let uiImage = UIImage(data: inlineDataPart.data) else {
      fatalError("Failed to convert data to UIImage.")
    }
  default:
    fatalError("Unsupported part type: \(part)")
  }
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
    modelName = "gemini-2.0-flash-preview-image-generation",
    // Configure the model to respond with text and images
    generationConfig = generationConfig {
responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) }
)

// Provide a text prompt instructing the model to generate interleaved text and images
val prompt = """
    Generate an illustrated recipe for a paella.
    Create images to go alongside the text as you generate the recipe
    """.trimIndent()

// To generate interleaved text and images, call `generateContent` with the text input
val responseContent = model.generateContent(prompt).candidates.first().content

// The response will contain image and text parts interleaved
for (part in responseContent.parts) {
    when (part) {
        is ImagePart -> {
            // ImagePart as a bitmap
            val generatedImageAsBitmap: Bitmap? = part.asImageOrNull()
        }
        is TextPart -> {
            // Text content from the TextPart
            val text = part.text
        }
    }
}

Java


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(
    "gemini-2.0-flash-preview-image-generation",
    // Configure the model to respond with text and images
    new GenerationConfig.Builder()
        .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))
        .build()
);

GenerativeModelFutures model = GenerativeModelFutures.from(ai);

// Provide a text prompt instructing the model to generate interleaved text and images
Content prompt = new Content.Builder()
        .addText("Generate an illustrated recipe for a paella.\n" +
                 "Create images to go alongside the text as you generate the recipe")
        .build();

// To generate interleaved text and images, call `generateContent` with the text input
ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        Content responseContent = result.getCandidates().get(0).getContent();
        // The response will contain image and text parts interleaved
        for (Part part : responseContent.getParts()) {
            if (part instanceof ImagePart) {
                // ImagePart as a bitmap
                Bitmap generatedImageAsBitmap = ((ImagePart) part).getImage();
            } else if (part instanceof TextPart){
                // Text content from the TextPart
                String text = ((TextPart) part).getText();
            }
        }
    }

    @Override
    public void onFailure(Throwable t) {
        System.err.println(t);
    }
}, executor);

Web


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, {
  model: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: {
    responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],
  },
});

// Provide a text prompt instructing the model to generate interleaved text and images
const prompt = 'Generate an illustrated recipe for a paella.\n.' +
  'Create images to go alongside the text as you generate the recipe';

// To generate interleaved text and images, call `generateContent` with the text input
const result = await model.generateContent(prompt);

// Handle the generated text and image
try {
  const response = result.response;
  if (response.candidates?.[0].content?.parts) {
    for (const part of response.candidates?.[0].content?.parts) {
      if (part.text) {
        // Do something with the text
        console.log(part.text)
      }
      if (part.inlineData) {
        // Do something with the image
        const image = part.inlineData;
        console.log(image.mimeType, image.data);
      }
    }
  }

} catch (err) {
  console.error('Prompt or candidate was blocked:', err);
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
final model = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.0-flash-preview-image-generation',
  // Configure the model to respond with text and images
  generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]),
);

// Provide a text prompt instructing the model to generate interleaved text and images
final prompt = [Content.text(
  'Generate an illustrated recipe for a paella\n ' +
  'Create images to go alongside the text as you generate the recipe'
)];

// To generate interleaved text and images, call `generateContent` with the text input
final response = await model.generateContent(prompt);

// Handle the generated text and image
final parts = response.candidates.firstOrNull?.content.parts
if (parts.isNotEmpty) {
  for (final part in parts) {
    if (part is TextPart) {
      // Do something with text part
      final text = part.text
    }
    if (part is InlineDataPart) {
      // Process image
      final imageBytes = part.bytes
    }
  }
} else {
  // Handle the case where no images were generated
  print('Error: No images were generated.');
}

Unity


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(
  modelName: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: new GenerationConfig(
    responseModalities: new[] { ResponseModality.Text, ResponseModality.Image })
);

// Provide a text prompt instructing the model to generate interleaved text and images
var prompt = "Generate an illustrated recipe for a paella \n" +
  "Create images to go alongside the text as you generate the recipe";

// To generate interleaved text and images, call `GenerateContentAsync` with the text input
var response = await model.GenerateContentAsync(prompt);

// Handle the generated text and image
foreach (var part in response.Candidates.First().Content.Parts) {
  if (part is ModelContent.TextPart textPart) {
    if (!string.IsNullOrWhiteSpace(textPart.Text)) {
      // Do something with the text
    }
  } else if (part is ModelContent.InlineDataPart dataPart) {
    if (dataPart.MimeType == "image/png") {
      // Load the Image into a Unity Texture2D object
      UnityEngine.Texture2D texture2D = new(2, 2);
      if (texture2D.LoadImage(dataPart.Data.ToArray())) {
        // Do something with the image
      }
    }
  }
}

了解如何选择适合您的应用场景和应用的模型

修改图片(文本和图片输入)

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在此部分中,您还需要点击所选 Gemini API 提供方的按钮,以便在此页面上看到特定于该提供方的相关内容

您可以通过文本和一张或多张图片提示 Gemini 模型来编辑图片。

请务必创建 GenerativeModel 实例,在模型配置中添加 responseModalities: ["TEXT", "IMAGE"],并调用 generateContent

Swift


import FirebaseAI

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
  modelName: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: GenerationConfig(responseModalities: [.text, .image])
)

// Provide an image for the model to edit
guard let image = UIImage(named: "scones") else { fatalError("Image file not found.") }

// Provide a text prompt instructing the model to edit the image
let prompt = "Edit this image to make it look like a cartoon"

// To edit the image, call `generateContent` with the image and text input
let response = try await model.generateContent(image, prompt)

// Handle the generated image
guard let inlineDataPart = response.inlineDataParts.first else {
  fatalError("No image data in response.")
}
guard let uiImage = UIImage(data: inlineDataPart.data) else {
  fatalError("Failed to convert data to UIImage.")
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
    modelName = "gemini-2.0-flash-preview-image-generation",
    // Configure the model to respond with text and images
    generationConfig = generationConfig {
responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) }
)

// Provide an image for the model to edit
val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.scones)

// Provide a text prompt instructing the model to edit the image
val prompt = content {
    image(bitmap)
    text("Edit this image to make it look like a cartoon")
}

// To edit the image, call `generateContent` with the prompt (image and text input)
val generatedImageAsBitmap = model.generateContent(prompt)
    // Handle the generated text and image
    .candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() }

Java


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(
    "gemini-2.0-flash-preview-image-generation",
    // Configure the model to respond with text and images
    new GenerationConfig.Builder()
        .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))
        .build()
);

GenerativeModelFutures model = GenerativeModelFutures.from(ai);

// Provide an image for the model to edit
Bitmap bitmap = BitmapFactory.decodeResource(resources, R.drawable.scones);

// Provide a text prompt instructing the model to edit the image
Content promptcontent = new Content.Builder()
        .addImage(bitmap)
        .addText("Edit this image to make it look like a cartoon")
        .build();

// To edit the image, call `generateContent` with the prompt (image and text input)
ListenableFuture<GenerateContentResponse> response = model.generateContent(promptcontent);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        // iterate over all the parts in the first candidate in the result object
        for (Part part : result.getCandidates().get(0).getContent().getParts()) {
            if (part instanceof ImagePart) {
                ImagePart imagePart = (ImagePart) part;
                Bitmap generatedImageAsBitmap = imagePart.getImage();
                break;
            }
        }
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, {
  model: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: {
    responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],
  },
});

// Prepare an image for the model to edit
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

// Provide a text prompt instructing the model to edit the image
const prompt = "Edit this image to make it look like a cartoon";

const fileInputEl = document.querySelector("input[type=file]");
const imagePart = await fileToGenerativePart(fileInputEl.files[0]);

// To edit the image, call `generateContent` with the image and text input
const result = await model.generateContent([prompt, imagePart]);

// Handle the generated image
try {
  const inlineDataParts = result.response.inlineDataParts();
  if (inlineDataParts?.[0]) {
    const image = inlineDataParts[0].inlineData;
    console.log(image.mimeType, image.data);
  }
} catch (err) {
  console.error('Prompt or candidate was blocked:', err);
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
final model = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.0-flash-preview-image-generation',
  // Configure the model to respond with text and images
  generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]),
);

// Prepare an image for the model to edit
final image = await File('scones.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);

// Provide a text prompt instructing the model to edit the image
final prompt = TextPart("Edit this image to make it look like a cartoon");

// To edit the image, call `generateContent` with the image and text input
final response = await model.generateContent([
  Content.multi([prompt,imagePart])
]);

// Handle the generated image
if (response.inlineDataParts.isNotEmpty) {
  final imageBytes = response.inlineDataParts[0].bytes;
  // Process the image
} else {
  // Handle the case where no images were generated
  print('Error: No images were generated.');
}

Unity


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(
  modelName: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: new GenerationConfig(
    responseModalities: new[] { ResponseModality.Text, ResponseModality.Image })
);

// Prepare an image for the model to edit
var imageFile = System.IO.File.ReadAllBytes(System.IO.Path.Combine(
  UnityEngine.Application.streamingAssetsPath, "scones.jpg"));
var image = ModelContent.InlineData("image/jpeg", imageFile);

// Provide a text prompt instructing the model to edit the image
var prompt = ModelContent.Text("Edit this image to make it look like a cartoon.");

// To edit the image, call `GenerateContent` with the image and text input
var response = await model.GenerateContentAsync(new [] { prompt, image });

var text = response.Text;
if (!string.IsNullOrWhiteSpace(text)) {
  // Do something with the text
}

// Handle the generated image
var imageParts = response.Candidates.First().Content.Parts
                         .OfType<ModelContent.InlineDataPart>()
                         .Where(part => part.MimeType == "image/png");
foreach (var imagePart in imageParts) {
  // Load the Image into a Unity Texture2D object
  Texture2D texture2D = new Texture2D(2, 2);
  if (texture2D.LoadImage(imagePart.Data.ToArray())) {
    // Do something with the image
  }
}

了解如何选择适合您的应用场景和应用的模型

使用多轮聊天迭代和修改图片

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在此部分中,您还需要点击所选 Gemini API 提供方的按钮,以便在此页面上看到特定于该提供方的相关内容

使用多轮对话功能,您可以使用 Gemini 模型对其生成或您提供的图片进行迭代。

请务必创建 GenerativeModel 实例,在模型配置中添加 responseModalities: ["TEXT", "IMAGE"],并调用 startChat()sendMessage() 以发送新用户消息。

Swift


import FirebaseAI

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
  modelName: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: GenerationConfig(responseModalities: [.text, .image])
)

// Initialize the chat
let chat = model.startChat()

guard let image = UIImage(named: "scones") else { fatalError("Image file not found.") }

// Provide an initial text prompt instructing the model to edit the image
let prompt = "Edit this image to make it look like a cartoon"

// To generate an initial response, send a user message with the image and text prompt
let response = try await chat.sendMessage(image, prompt)

// Inspect the generated image
guard let inlineDataPart = response.inlineDataParts.first else {
  fatalError("No image data in response.")
}
guard let uiImage = UIImage(data: inlineDataPart.data) else {
  fatalError("Failed to convert data to UIImage.")
}

// Follow up requests do not need to specify the image again
let followUpResponse = try await chat.sendMessage("But make it old-school line drawing style")

// Inspect the edited image after the follow up request
guard let followUpInlineDataPart = followUpResponse.inlineDataParts.first else {
  fatalError("No image data in response.")
}
guard let followUpUIImage = UIImage(data: followUpInlineDataPart.data) else {
  fatalError("Failed to convert data to UIImage.")
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
    modelName = "gemini-2.0-flash-preview-image-generation",
    // Configure the model to respond with text and images
    generationConfig = generationConfig {
responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) }
)

// Provide an image for the model to edit
val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.scones)

// Create the initial prompt instructing the model to edit the image
val prompt = content {
    image(bitmap)
    text("Edit this image to make it look like a cartoon")
}

// Initialize the chat
val chat = model.startChat()

// To generate an initial response, send a user message with the image and text prompt
var response = chat.sendMessage(prompt)
// Inspect the returned image
var generatedImageAsBitmap = response
    .candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() }

// Follow up requests do not need to specify the image again
response = chat.sendMessage("But make it old-school line drawing style")
generatedImageAsBitmap = response
    .candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() }

Java


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(
    "gemini-2.0-flash-preview-image-generation",
    // Configure the model to respond with text and images
    new GenerationConfig.Builder()
        .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))
        .build()
);

GenerativeModelFutures model = GenerativeModelFutures.from(ai);

// Provide an image for the model to edit
Bitmap bitmap = BitmapFactory.decodeResource(resources, R.drawable.scones);

// Initialize the chat
ChatFutures chat = model.startChat();

// Create the initial prompt instructing the model to edit the image
Content prompt = new Content.Builder()
        .setRole("user")
        .addImage(bitmap)
        .addText("Edit this image to make it look like a cartoon")
        .build();

// To generate an initial response, send a user message with the image and text prompt
ListenableFuture<GenerateContentResponse> response = chat.sendMessage(prompt);
// Extract the image from the initial response
ListenableFuture<@Nullable Bitmap> initialRequest = Futures.transform(response, result -> {
    for (Part part : result.getCandidates().get(0).getContent().getParts()) {
        if (part instanceof ImagePart) {
            ImagePart imagePart = (ImagePart) part;
            return imagePart.getImage();
        }
    }
    return null;
}, executor);

// Follow up requests do not need to specify the image again
ListenableFuture<GenerateContentResponse> modelResponseFuture = Futures.transformAsync(
        initialRequest,
        generatedImage -> {
            Content followUpPrompt = new Content.Builder()
                    .addText("But make it old-school line drawing style")
                    .build();
            return chat.sendMessage(followUpPrompt);
        },
        executor);

// Add a final callback to check the reworked image
Futures.addCallback(modelResponseFuture, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        for (Part part : result.getCandidates().get(0).getContent().getParts()) {
            if (part instanceof ImagePart) {
                ImagePart imagePart = (ImagePart) part;
                Bitmap generatedImageAsBitmap = imagePart.getImage();
                break;
            }
        }
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, {
  model: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: {
    responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],
  },
});

// Prepare an image for the model to edit
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

const fileInputEl = document.querySelector("input[type=file]");
const imagePart = await fileToGenerativePart(fileInputEl.files[0]);

// Provide an initial text prompt instructing the model to edit the image
const prompt = "Edit this image to make it look like a cartoon";

// Initialize the chat
const chat = model.startChat();

// To generate an initial response, send a user message with the image and text prompt
const result = await chat.sendMessage([prompt, imagePart]);

// Request and inspect the generated image
try {
  const inlineDataParts = result.response.inlineDataParts();
  if (inlineDataParts?.[0]) {
    // Inspect the generated image
    const image = inlineDataParts[0].inlineData;
    console.log(image.mimeType, image.data);
  }
} catch (err) {
  console.error('Prompt or candidate was blocked:', err);
}

// Follow up requests do not need to specify the image again
const followUpResult = await chat.sendMessage("But make it old-school line drawing style");

// Request and inspect the returned image
try {
  const followUpInlineDataParts = followUpResult.response.inlineDataParts();
  if (followUpInlineDataParts?.[0]) {
    // Inspect the generated image
    const followUpImage = followUpInlineDataParts[0].inlineData;
    console.log(followUpImage.mimeType, followUpImage.data);
  }
} catch (err) {
  console.error('Prompt or candidate was blocked:', err);
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
final model = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.0-flash-preview-image-generation',
  // Configure the model to respond with text and images
  generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]),
);

// Prepare an image for the model to edit
final image = await File('scones.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);

// Provide an initial text prompt instructing the model to edit the image
final prompt = TextPart("Edit this image to make it look like a cartoon");

// Initialize the chat
final chat = model.startChat();

// To generate an initial response, send a user message with the image and text prompt
final response = await chat.sendMessage([
  Content.multi([prompt,imagePart])
]);

// Inspect the returned image
if (response.inlineDataParts.isNotEmpty) {
  final imageBytes = response.inlineDataParts[0].bytes;
  // Process the image
} else {
  // Handle the case where no images were generated
  print('Error: No images were generated.');
}

// Follow up requests do not need to specify the image again
final followUpResponse = await chat.sendMessage([
  Content.text("But make it old-school line drawing style")
]);

// Inspect the returned image
if (followUpResponse.inlineDataParts.isNotEmpty) {
  final followUpImageBytes = response.inlineDataParts[0].bytes;
  // Process the image
} else {
  // Handle the case where no images were generated
  print('Error: No images were generated.');
}

Unity


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(
  modelName: "gemini-2.0-flash-preview-image-generation",
  // Configure the model to respond with text and images
  generationConfig: new GenerationConfig(
    responseModalities: new[] { ResponseModality.Text, ResponseModality.Image })
);

// Prepare an image for the model to edit
var imageFile = System.IO.File.ReadAllBytes(System.IO.Path.Combine(
  UnityEngine.Application.streamingAssetsPath, "scones.jpg"));
var image = ModelContent.InlineData("image/jpeg", imageFile);

// Provide an initial text prompt instructing the model to edit the image
var prompt = ModelContent.Text("Edit this image to make it look like a cartoon.");

// Initialize the chat
var chat = model.StartChat();

// To generate an initial response, send a user message with the image and text prompt
var response = await chat.SendMessageAsync(new [] { prompt, image });

// Inspect the returned image
var imageParts = response.Candidates.First().Content.Parts
                         .OfType<ModelContent.InlineDataPart>()
                         .Where(part => part.MimeType == "image/png");
// Load the image into a Unity Texture2D object
UnityEngine.Texture2D texture2D = new(2, 2);
if (texture2D.LoadImage(imageParts.First().Data.ToArray())) {
  // Do something with the image
}

// Follow up requests do not need to specify the image again
var followUpResponse = await chat.SendMessageAsync("But make it old-school line drawing style");

// Inspect the returned image
var followUpImageParts = followUpResponse.Candidates.First().Content.Parts
                         .OfType<ModelContent.InlineDataPart>()
                         .Where(part => part.MimeType == "image/png");
// Load the image into a Unity Texture2D object
UnityEngine.Texture2D followUpTexture2D = new(2, 2);
if (followUpTexture2D.LoadImage(followUpImageParts.First().Data.ToArray())) {
  // Do something with the image
}

了解如何选择适合您的应用场景和应用的模型



支持的功能、限制和最佳实践

支持的模态和功能

以下是 Gemini 模型支持的图片输出模态和功能。每个 capability 都会显示一个示例提示,并在其上方提供一个示例代码。

  • 文本转图片(纯文本转图片)

    • 生成一张背景为烟花的埃菲尔铁塔图片。
  • 文本转图片(文本渲染)

    • 将这段巨大的文字投影映射到建筑物正面,生成一张大楼的电影效果照片。
  • 文本转图片和文本(交织)

    • 生成带插图的西班牙海鲜饭食谱。在生成食谱时,一并创建图片和文字。

    • 生成一个以 3D 卡通动画风格讲述狗狗故事的视频。为每个场景生成一张图片。

  • 图片和文本转图片和文本(交织)

    • [家具齐全的房间的图片] + 我还可以选择哪些颜色的沙发?您能更新一下图片吗?
  • 图片编辑(文字和图片转图片)

    • [图片:司康饼] + 将此图片修改为卡通图片

    • [猫的图片] + [枕头的图片] + 在这种枕头上用十字绣制作我猫的图案。

  • 多轮图片编辑(聊天)

    • [蓝色汽车的图片] + 将这辆车改为敞篷车。,然后现在将颜色更改为黄色。

限制和最佳做法

以下是 Gemini 模型的图片输出限制和最佳实践。

  • 在此公开实验版中,Gemini 支持以下功能:

    • 生成的 PNG 图片的最大尺寸为 1024 像素。
    • 生成和编辑人物图片。
    • 使用安全过滤器,提供更灵活、更宽松的用户体验。
  • 为了获得最佳效果,请使用以下语言:enes-mxja-jpzh-cnhi-in

  • 图片生成功能不支持音频或视频输入。

  • 系统可能不会始终触发图片生成功能。以下是一些已知问题:

    • 模型可能只会输出文本。
      尝试明确请求图片输出(例如“生成图片”“随时提供图片”“更新图片”)。

    • 模型可能会在中途停止生成。
      请重试或尝试使用其他问题。

    • 模型可能会以图片的形式生成文本。
      尝试明确要求输出文本。例如,“生成插图和故事文本”。

  • 为图片生成文本时,Gemini 的最佳使用方式是先生成文本,然后请求包含文本的图片。