NanoBanana (Gemini 图片生成)
NanoBanana 图片生成 API 完整文档,基于 Gemini generateContent 接口
生图请将令牌选择为默认分组
基本信息
| 项目 | 值 |
|---|---|
| Base URL | https://www.vibeapi.cn |
| 接口路径 | /v1beta/models/{model}:generateContent |
| 请求方法 | POST |
| 认证方式 | Authorization: Bearer <API_KEY> |
| 超时建议 | 512px/1K: 80s, 2K: 200s, 4K: 350s |
可用模型
| 模型 | 适用场景 | 特点 |
|---|---|---|
gemini-3-pro-image-preview | 专业素材、复杂指令 | 高级推理、搜索接地、最高 4K、最多 14 张参考图 |
gemini-3.1-flash-image | 日常生成、批量任务 | 性价比高、支持 512px-4K、思考等级控制、图片搜索接地 |
请求格式
{
"contents": [
{
"parts": [
{ "text": "你的提示词" }
]
}
],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "1K"
}
}
}generationConfig 参数
| 参数 | 类型 | 说明 |
|---|---|---|
responseModalities | string[] | ["IMAGE"] 仅图片;["TEXT", "IMAGE"] 图文混合(默认) |
imageConfig.aspectRatio | string | 宽高比,见下方支持列表 |
imageConfig.imageSize | string | 分辨率档位:512px(仅 flash)、1K、2K、4K。必须大写 K |
支持的宽高比
全部 14 种,两个模型均已验证通过:
1:1 1:4 1:8 2:3 3:2 3:4 4:1 4:3 4:5 5:4 8:1 9:16 16:9 21:9
响应格式
{
"candidates": [
{
"content": {
"role": "model",
"parts": [
{
"inlineData": {
"mimeType": "image/png",
"data": "<BASE64_IMAGE_DATA>"
}
}
]
},
"finishReason": "STOP"
}
],
"usageMetadata": {
"promptTokenCount": 10,
"candidatesTokenCount": 1120,
"totalTokenCount": 1130
}
}图片在 candidates[0].content.parts[].inlineData 中,base64 编码。
当 responseModalities 包含 "TEXT" 时,parts 中可能同时包含 text 和 inlineData。
分辨率参考表
gemini-3.1-flash-image(实测验证)
| 宽高比 | 512px | 1K | 2K | 4K |
|---|---|---|---|---|
| 1:1 | 512×512 | 1024×1024 | 2048×2048 | 4096×4096 |
| 1:4 | 256×1024 | 512×2064 | 1024×4128 | 2048×8256 |
| 1:8 | 176×1456 | 352×2928 | 704×5856 | 1408×11712 |
| 2:3 | 416×624 | 848×1264 | 1696×2528 | 3392×5056 |
| 3:2 | 624×416 | 1264×848 | 2528×1696 | 5056×3392 |
| 3:4 | 448×592 | 896×1200 | 1792×2400 | 3584×4800 |
| 4:1 | 1024×256 | 2064×512 | 4128×1024 | 8256×2048 |
| 4:3 | 592×448 | 1200×896 | 2400×1792 | 4800×3584 |
| 4:5 | 464×576 | 928×1152 | 1856×2304 | 3712×4608 |
| 5:4 | 576×464 | 1152×928 | 2304×1856 | 4608×3712 |
| 8:1 | 1456×176 | 2928×352 | 5856×704 | 11712×1408 |
| 9:16 | 384×688 | 768×1376 | 1536×2752 | 3072×5504 |
| 16:9 | 688×384 | 1376×768 | 2752×1536 | 5504×3072 |
| 21:9 | 784×336 | 1584×672 | 3168×1344 | 6336×2688 |
512px 档位仅 flash 模型支持。
gemini-3-pro-image-preview
支持 1K、2K、4K,不支持 512px。分辨率与 flash 的 1K/2K/4K 一致。
耗时参考(flash 实测)
| 档位 | 典型耗时 |
|---|---|
| 512px | 10-17s |
| 1K | 13-40s |
| 2K | 40-170s |
| 4K | 120-310s |
调用示例
示例一:文本生成图片(curl)
curl -s -X POST \
"https://www.vibeapi.cn/v1beta/models/gemini-3.1-flash-image:generateContent" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "一只可爱的猫咪在阳光下打盹"}]}],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "1K"
}
}
}' -o response.json
# 提取图片
jq -r '.candidates[0].content.parts[0].inlineData.data' response.json \
| base64 -d > output.png示例二:文本生成图片(Python)
import json, base64, requests
from pathlib import Path
API_BASE = "https://www.vibeapi.cn"
API_KEY = "<YOUR_API_KEY>"
def generate_image(prompt, model="gemini-3.1-flash-image",
aspect_ratio="1:1", image_size="1K",
modalities=None):
"""文本生成图片"""
url = f"{API_BASE}/v1beta/models/{model}:generateContent"
config = {"responseModalities": modalities or ["IMAGE"]}
if aspect_ratio or image_size:
img_cfg = {}
if aspect_ratio:
img_cfg["aspectRatio"] = aspect_ratio
if image_size:
img_cfg["imageSize"] = image_size
config["imageConfig"] = img_cfg
resp = requests.post(url,
headers={"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"},
json={"contents": [{"parts": [{"text": prompt}]}],
"generationConfig": config},
timeout=300)
resp.raise_for_status()
return resp.json()
def save_images(resp_json, prefix="output"):
"""从响应中提取并保存图片"""
saved = []
for cand in resp_json.get("candidates", []):
for i, part in enumerate(cand.get("content", {}).get("parts", [])):
inline = part.get("inlineData")
if inline:
ext = inline["mimeType"].split("/")[-1]
path = f"{prefix}_{i}.{ext}"
Path(path).write_bytes(base64.b64decode(inline["data"]))
saved.append(path)
elif part.get("text"):
print(f"文本: {part['text'][:200]}")
return saved
# 基本用法
result = generate_image("一只可爱的猫咪在阳光下打盹")
paths = save_images(result, "cat")
print(f"已保存: {paths}")
# 指定比例和分辨率
result = generate_image("城市天际线全景", aspect_ratio="21:9", image_size="4K")
save_images(result, "skyline")
# 图文混合输出
result = generate_image("画一只猫并描述它", modalities=["TEXT", "IMAGE"])
save_images(result, "cat_with_text")示例三:Google 官方 SDK(Python,推荐)
pip install google-genai 后即可使用,代码比 requests 简洁很多:
from google import genai
from google.genai import types
client = genai.Client(
api_key="<YOUR_API_KEY>",
http_options={"base_url": "https://www.vibeapi.cn"}
)
# 文本生图
response = client.models.generate_content(
model="gemini-3.1-flash-image",
contents="一只可爱的猫咪在阳光下打盹",
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(
aspect_ratio="16:9",
image_size="1K",
),
),
)
for part in response.parts:
if part.inline_data is not None:
part.as_image().save("cat.png") # 直接保存为文件
breakSDK 图片编辑(直接传 PIL Image):
from PIL import Image
cat_img = Image.open("cat.png")
response = client.models.generate_content(
model="gemini-3.1-flash-image",
contents=[
cat_img,
"给这只猫戴上一顶圣诞帽",
],
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(aspect_ratio="1:1", image_size="1K"),
),
)
for part in response.parts:
if part.inline_data is not None:
part.as_image().save("cat_hat.png")
break示例四:图片编辑(requests)
提供原图 + 文字指令来修改图片:
import base64, requests
from pathlib import Path
API_BASE = "https://www.vibeapi.cn"
API_KEY = "<YOUR_API_KEY>"
def edit_image(image_path, instruction, model="gemini-3.1-flash-image",
aspect_ratio=None, image_size=None):
"""编辑已有图片"""
img_bytes = Path(image_path).read_bytes()
img_b64 = base64.b64encode(img_bytes).decode()
mime = "image/jpeg" if image_path.endswith((".jpg",".jpeg")) else "image/png"
parts = [
{"text": instruction},
{"inline_data": {"mime_type": mime, "data": img_b64}}
]
config = {"responseModalities": ["IMAGE"]}
if aspect_ratio or image_size:
img_cfg = {}
if aspect_ratio:
img_cfg["aspectRatio"] = aspect_ratio
if image_size:
img_cfg["imageSize"] = image_size
config["imageConfig"] = img_cfg
resp = requests.post(
f"{API_BASE}/v1beta/models/{model}:generateContent",
headers={"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"},
json={"contents": [{"parts": parts}],
"generationConfig": config},
timeout=300)
resp.raise_for_status()
return resp.json()
# 给猫加一顶帽子
result = edit_image("cat.jpg", "给这只猫戴上一顶圣诞帽")局部重绘(语义遮盖)
通过文字描述指定修改区域,保持其余部分不变:
# 只改背景,保留主体
result = edit_image("cat.jpg", "Change only the background to a snowy winter scene. Keep the cat exactly the same.")风格迁移
将照片以不同艺术风格重新创作:
result = edit_image("cat.jpg",
"Transform this photograph into the artistic style of Vincent van Gogh's Starry Night. "
"Preserve the original composition but render with swirling, impasto brushstrokes.")多图合成
提供多张图片作为参考,创建合成场景:
import base64, requests
from pathlib import Path
API_BASE = "https://www.vibeapi.cn"
API_KEY = "<YOUR_API_KEY>"
img1_b64 = base64.b64encode(Path("dress.png").read_bytes()).decode()
img2_b64 = base64.b64encode(Path("model.png").read_bytes()).decode()
resp = requests.post(
f"{API_BASE}/v1beta/models/gemini-3.1-flash-image:generateContent",
headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
json={
"contents": [{"parts": [
{"text": "让第二张图中的人穿上第一张图中的蓝色连衣裙,生成一张专业电商照片"},
{"inline_data": {"mime_type": "image/png", "data": img1_b64}},
{"inline_data": {"mime_type": "image/png", "data": img2_b64}},
]}],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {"aspectRatio": "3:4", "imageSize": "2K"}
}
},
timeout=300,
)SDK 写法更简洁:
from PIL import Image
response = client.models.generate_content(
model="gemini-3.1-flash-image",
contents=[
"让第二张图中的人穿上第一张图中的蓝色连衣裙,生成一张专业电商照片",
Image.open("dress.png"),
Image.open("model.png"),
],
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(aspect_ratio="3:4", image_size="2K"),
),
)示例五:多轮对话迭代图片(Python)
import base64, requests
from pathlib import Path
API_BASE = "https://www.vibeapi.cn"
API_KEY = "<YOUR_API_KEY>"
MODEL = "gemini-3.1-flash-image"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
def chat_image(history, config=None):
"""多轮对话,history 为 contents 数组"""
body = {"contents": history,
"generationConfig": config or {"responseModalities": ["TEXT", "IMAGE"]}}
resp = requests.post(f"{API_BASE}/v1beta/models/{MODEL}:generateContent",
headers=HEADERS, json=body, timeout=300)
resp.raise_for_status()
return resp.json()
# 第一轮:生成
history = [{"role": "user", "parts": [{"text": "画一只橘猫坐在窗台上"}]}]
r1 = chat_image(history)
# 提取模型回复加入历史
model_parts = r1["candidates"][0]["content"]["parts"]
history.append({"role": "model", "parts": model_parts})
# 第二轮:修改
history.append({"role": "user", "parts": [{"text": "把背景改成下雨天"}]})
r2 = chat_image(history)多轮对话时,需将前一轮的完整
parts(含inlineData和thoughtSignature)原样传回。
示例六:指定宽高比和分辨率(curl 快速对比)
# 竖版手机壁纸 9:16 + 2K
curl -s -X POST \
"https://www.vibeapi.cn/v1beta/models/gemini-3.1-flash-image:generateContent" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "梦幻星空下的猫咪剪影"}]}],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {"aspectRatio": "9:16", "imageSize": "2K"}
}
}' | jq -r '.candidates[0].content.parts[0].inlineData.data' \
| base64 -d > wallpaper_9x16_2K.png
# 输出: 1536×2752
# 超宽横幅 21:9 + 4K
curl -s -X POST \
"https://www.vibeapi.cn/v1beta/models/gemini-3.1-flash-image:generateContent" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "赛博朋克城市全景"}]}],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {"aspectRatio": "21:9", "imageSize": "4K"}
}
}' | jq -r '.candidates[0].content.parts[0].inlineData.data' \
| base64 -d > banner_21x9_4K.png
# 输出: 6336×2688
# 快速预览 512px(仅 flash 支持,最省 token)
curl -s -X POST \
"https://www.vibeapi.cn/v1beta/models/gemini-3.1-flash-image:generateContent" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "一只猫"}]}],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {"aspectRatio": "1:1", "imageSize": "512px"}
}
}' | jq -r '.candidates[0].content.parts[0].inlineData.data' \
| base64 -d > preview_512.png
# 输出: 512×512, 仅 747 token高级功能
Google 搜索接地(两个模型均支持)
基于实时搜索数据生成图片(如天气、新闻、股票)。在请求中添加 tools 字段:
{
"contents": [{"parts": [{"text": "可视化旧金山今天的天气预报"}]}],
"tools": [{"google_search": {}}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"imageConfig": {"aspectRatio": "16:9"}
}
}响应中会额外返回 groundingMetadata,包含 webSearchQueries(搜索词)和 groundingChunks(来源链接)。
Flash 还额外支持图片搜索接地,可用网络图片作为视觉参考:
"tools": [{"google_search": {"search_types": {"web_search": {}, "image_search": {}}}}]思考模式
Pro 模型默认启用思考模式,会先生成构思草图再输出最终图片。响应中 part.thought == true 的为思考过程,可跳过。
Flash 模型支持控制思考等级(minimal 默认 或 high),通过 generationConfig.thinkingConfig 设置:
{
"contents": [{"parts": [{"text": "A futuristic city inside a glass bottle floating in space"}]}],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {"aspectRatio": "1:1", "imageSize": "1K"},
"thinkingConfig": {
"thinkingLevel": "high",
"includeThoughts": true
}
}
}high 模式下响应 parts 会包含多种类型:
| part 类型 | 说明 |
|---|---|
thought == true + text | 思考文本(推理过程) |
thought == true + inlineData | 构思草图(临时图片) |
inlineData(无 thought) | 最终输出图片 |
提取最终图片时跳过 thought parts:
for part in response.candidates[0].content.parts:
if getattr(part, "thought", False):
continue # 跳过思考过程
if part.get("inlineData"):
# 这是最终图片
save(part["inlineData"])无论
includeThoughts设为 true 还是 false,思考 token 都会计费。high模式会消耗更多 token 但图片质量更高。
多张参考图片
Pro 支持最多 6 张对象图 + 5 张人物图(共 14 张);Flash 支持最多 10 张对象图 + 4 张人物图。
注意事项
- imageSize 大小写:必须用大写
K(1K、2K、4K),小写1k会被拒绝 - 512px 仅 flash:Pro 模型不支持 512px 档位
- 超时:4K 分辨率生成可能需要 2-5 分钟,务必设置足够的超时
- Token 消耗:512px 约 747 token,1K 约 1120 token,4K 约 2000 token
- 默认行为:不传
imageConfig时,默认输出约 1408×768(接近 16:9 的 1K) - 图片格式:响应中
mimeType通常为image/png,偶尔为image/jpeg - SynthID 水印:所有生成图片均包含 SynthID 数字水印
- 推荐语言:英语、中文、日语、韩语、法语、德语、西班牙语等