VibeAPI

NanoBanana (Gemini 图片生成)

NanoBanana 图片生成 API 完整文档,基于 Gemini generateContent 接口

生图请将令牌选择为默认分组

基本信息

项目
Base URLhttps://www.vibeapi.cn
接口路径/v1beta/models/{model}:generateContent
请求方法POST
认证方式Authorization: Bearer <API_KEY>
超时建议512px/1K: 80s, 2K: 200s, 4K: 350s

可用模型

模型适用场景特点
gemini-3-pro-image-preview专业素材、复杂指令高级推理、搜索接地、最高 4K、最多 14 张参考图
gemini-3.1-flash-image日常生成、批量任务性价比高、支持 512px-4K、思考等级控制、图片搜索接地

请求格式

{
  "contents": [
    {
      "parts": [
        { "text": "你的提示词" }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": ["IMAGE"],
    "imageConfig": {
      "aspectRatio": "16:9",
      "imageSize": "1K"
    }
  }
}

generationConfig 参数

参数类型说明
responseModalitiesstring[]["IMAGE"] 仅图片;["TEXT", "IMAGE"] 图文混合(默认)
imageConfig.aspectRatiostring宽高比,见下方支持列表
imageConfig.imageSizestring分辨率档位:512px(仅 flash)、1K2K4K必须大写 K

支持的宽高比

全部 14 种,两个模型均已验证通过:

1:1 1:4 1:8 2:3 3:2 3:4 4:1 4:3 4:5 5:4 8:1 9:16 16:9 21:9


响应格式

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "<BASE64_IMAGE_DATA>"
            }
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 1120,
    "totalTokenCount": 1130
  }
}

图片在 candidates[0].content.parts[].inlineData 中,base64 编码。

responseModalities 包含 "TEXT" 时,parts 中可能同时包含 textinlineData


分辨率参考表

gemini-3.1-flash-image(实测验证)

宽高比512px1K2K4K
1:1512×5121024×10242048×20484096×4096
1:4256×1024512×20641024×41282048×8256
1:8176×1456352×2928704×58561408×11712
2:3416×624848×12641696×25283392×5056
3:2624×4161264×8482528×16965056×3392
3:4448×592896×12001792×24003584×4800
4:11024×2562064×5124128×10248256×2048
4:3592×4481200×8962400×17924800×3584
4:5464×576928×11521856×23043712×4608
5:4576×4641152×9282304×18564608×3712
8:11456×1762928×3525856×70411712×1408
9:16384×688768×13761536×27523072×5504
16:9688×3841376×7682752×15365504×3072
21:9784×3361584×6723168×13446336×2688

512px 档位仅 flash 模型支持。

gemini-3-pro-image-preview

支持 1K2K4K,不支持 512px。分辨率与 flash 的 1K/2K/4K 一致。

耗时参考(flash 实测)

档位典型耗时
512px10-17s
1K13-40s
2K40-170s
4K120-310s

调用示例

示例一:文本生成图片(curl)

curl -s -X POST \
  "https://www.vibeapi.cn/v1beta/models/gemini-3.1-flash-image:generateContent" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "一只可爱的猫咪在阳光下打盹"}]}],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "1K"
      }
    }
  }' -o response.json

# 提取图片
jq -r '.candidates[0].content.parts[0].inlineData.data' response.json \
  | base64 -d > output.png

示例二:文本生成图片(Python)

import json, base64, requests
from pathlib import Path

API_BASE = "https://www.vibeapi.cn"
API_KEY  = "<YOUR_API_KEY>"

def generate_image(prompt, model="gemini-3.1-flash-image",
                   aspect_ratio="1:1", image_size="1K",
                   modalities=None):
    """文本生成图片"""
    url = f"{API_BASE}/v1beta/models/{model}:generateContent"
    config = {"responseModalities": modalities or ["IMAGE"]}
    if aspect_ratio or image_size:
        img_cfg = {}
        if aspect_ratio:
            img_cfg["aspectRatio"] = aspect_ratio
        if image_size:
            img_cfg["imageSize"] = image_size
        config["imageConfig"] = img_cfg

    resp = requests.post(url,
        headers={"Authorization": f"Bearer {API_KEY}",
                 "Content-Type": "application/json"},
        json={"contents": [{"parts": [{"text": prompt}]}],
              "generationConfig": config},
        timeout=300)
    resp.raise_for_status()
    return resp.json()

def save_images(resp_json, prefix="output"):
    """从响应中提取并保存图片"""
    saved = []
    for cand in resp_json.get("candidates", []):
        for i, part in enumerate(cand.get("content", {}).get("parts", [])):
            inline = part.get("inlineData")
            if inline:
                ext = inline["mimeType"].split("/")[-1]
                path = f"{prefix}_{i}.{ext}"
                Path(path).write_bytes(base64.b64decode(inline["data"]))
                saved.append(path)
            elif part.get("text"):
                print(f"文本: {part['text'][:200]}")
    return saved

# 基本用法
result = generate_image("一只可爱的猫咪在阳光下打盹")
paths = save_images(result, "cat")
print(f"已保存: {paths}")

# 指定比例和分辨率
result = generate_image("城市天际线全景", aspect_ratio="21:9", image_size="4K")
save_images(result, "skyline")

# 图文混合输出
result = generate_image("画一只猫并描述它", modalities=["TEXT", "IMAGE"])
save_images(result, "cat_with_text")

示例三:Google 官方 SDK(Python,推荐)

pip install google-genai 后即可使用,代码比 requests 简洁很多:

from google import genai
from google.genai import types

client = genai.Client(
    api_key="<YOUR_API_KEY>",
    http_options={"base_url": "https://www.vibeapi.cn"}
)

# 文本生图
response = client.models.generate_content(
    model="gemini-3.1-flash-image",
    contents="一只可爱的猫咪在阳光下打盹",
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="1K",
        ),
    ),
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("cat.png")  # 直接保存为文件
        break

SDK 图片编辑(直接传 PIL Image):

from PIL import Image

cat_img = Image.open("cat.png")

response = client.models.generate_content(
    model="gemini-3.1-flash-image",
    contents=[
        cat_img,
        "给这只猫戴上一顶圣诞帽",
    ],
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=types.ImageConfig(aspect_ratio="1:1", image_size="1K"),
    ),
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("cat_hat.png")
        break

示例四:图片编辑(requests)

提供原图 + 文字指令来修改图片:

import base64, requests
from pathlib import Path

API_BASE = "https://www.vibeapi.cn"
API_KEY  = "<YOUR_API_KEY>"

def edit_image(image_path, instruction, model="gemini-3.1-flash-image",
               aspect_ratio=None, image_size=None):
    """编辑已有图片"""
    img_bytes = Path(image_path).read_bytes()
    img_b64 = base64.b64encode(img_bytes).decode()
    mime = "image/jpeg" if image_path.endswith((".jpg",".jpeg")) else "image/png"

    parts = [
        {"text": instruction},
        {"inline_data": {"mime_type": mime, "data": img_b64}}
    ]
    config = {"responseModalities": ["IMAGE"]}
    if aspect_ratio or image_size:
        img_cfg = {}
        if aspect_ratio:
            img_cfg["aspectRatio"] = aspect_ratio
        if image_size:
            img_cfg["imageSize"] = image_size
        config["imageConfig"] = img_cfg

    resp = requests.post(
        f"{API_BASE}/v1beta/models/{model}:generateContent",
        headers={"Authorization": f"Bearer {API_KEY}",
                 "Content-Type": "application/json"},
        json={"contents": [{"parts": parts}],
              "generationConfig": config},
        timeout=300)
    resp.raise_for_status()
    return resp.json()

# 给猫加一顶帽子
result = edit_image("cat.jpg", "给这只猫戴上一顶圣诞帽")

局部重绘(语义遮盖)

通过文字描述指定修改区域,保持其余部分不变:

# 只改背景,保留主体
result = edit_image("cat.jpg", "Change only the background to a snowy winter scene. Keep the cat exactly the same.")

风格迁移

将照片以不同艺术风格重新创作:

result = edit_image("cat.jpg",
    "Transform this photograph into the artistic style of Vincent van Gogh's Starry Night. "
    "Preserve the original composition but render with swirling, impasto brushstrokes.")

多图合成

提供多张图片作为参考,创建合成场景:

import base64, requests
from pathlib import Path

API_BASE = "https://www.vibeapi.cn"
API_KEY  = "<YOUR_API_KEY>"

img1_b64 = base64.b64encode(Path("dress.png").read_bytes()).decode()
img2_b64 = base64.b64encode(Path("model.png").read_bytes()).decode()

resp = requests.post(
    f"{API_BASE}/v1beta/models/gemini-3.1-flash-image:generateContent",
    headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
    json={
        "contents": [{"parts": [
            {"text": "让第二张图中的人穿上第一张图中的蓝色连衣裙,生成一张专业电商照片"},
            {"inline_data": {"mime_type": "image/png", "data": img1_b64}},
            {"inline_data": {"mime_type": "image/png", "data": img2_b64}},
        ]}],
        "generationConfig": {
            "responseModalities": ["IMAGE"],
            "imageConfig": {"aspectRatio": "3:4", "imageSize": "2K"}
        }
    },
    timeout=300,
)

SDK 写法更简洁:

from PIL import Image

response = client.models.generate_content(
    model="gemini-3.1-flash-image",
    contents=[
        "让第二张图中的人穿上第一张图中的蓝色连衣裙,生成一张专业电商照片",
        Image.open("dress.png"),
        Image.open("model.png"),
    ],
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=types.ImageConfig(aspect_ratio="3:4", image_size="2K"),
    ),
)

示例五:多轮对话迭代图片(Python)

import base64, requests
from pathlib import Path

API_BASE = "https://www.vibeapi.cn"
API_KEY  = "<YOUR_API_KEY>"
MODEL    = "gemini-3.1-flash-image"
HEADERS  = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}

def chat_image(history, config=None):
    """多轮对话,history 为 contents 数组"""
    body = {"contents": history,
            "generationConfig": config or {"responseModalities": ["TEXT", "IMAGE"]}}
    resp = requests.post(f"{API_BASE}/v1beta/models/{MODEL}:generateContent",
                         headers=HEADERS, json=body, timeout=300)
    resp.raise_for_status()
    return resp.json()

# 第一轮:生成
history = [{"role": "user", "parts": [{"text": "画一只橘猫坐在窗台上"}]}]
r1 = chat_image(history)

# 提取模型回复加入历史
model_parts = r1["candidates"][0]["content"]["parts"]
history.append({"role": "model", "parts": model_parts})

# 第二轮:修改
history.append({"role": "user", "parts": [{"text": "把背景改成下雨天"}]})
r2 = chat_image(history)

多轮对话时,需将前一轮的完整 parts(含 inlineDatathoughtSignature)原样传回。

示例六:指定宽高比和分辨率(curl 快速对比)

# 竖版手机壁纸 9:16 + 2K
curl -s -X POST \
  "https://www.vibeapi.cn/v1beta/models/gemini-3.1-flash-image:generateContent" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "梦幻星空下的猫咪剪影"}]}],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {"aspectRatio": "9:16", "imageSize": "2K"}
    }
  }' | jq -r '.candidates[0].content.parts[0].inlineData.data' \
     | base64 -d > wallpaper_9x16_2K.png
# 输出: 1536×2752

# 超宽横幅 21:9 + 4K
curl -s -X POST \
  "https://www.vibeapi.cn/v1beta/models/gemini-3.1-flash-image:generateContent" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "赛博朋克城市全景"}]}],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {"aspectRatio": "21:9", "imageSize": "4K"}
    }
  }' | jq -r '.candidates[0].content.parts[0].inlineData.data' \
     | base64 -d > banner_21x9_4K.png
# 输出: 6336×2688

# 快速预览 512px(仅 flash 支持,最省 token)
curl -s -X POST \
  "https://www.vibeapi.cn/v1beta/models/gemini-3.1-flash-image:generateContent" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "一只猫"}]}],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {"aspectRatio": "1:1", "imageSize": "512px"}
    }
  }' | jq -r '.candidates[0].content.parts[0].inlineData.data' \
     | base64 -d > preview_512.png
# 输出: 512×512, 仅 747 token

高级功能

Google 搜索接地(两个模型均支持)

基于实时搜索数据生成图片(如天气、新闻、股票)。在请求中添加 tools 字段:

{
  "contents": [{"parts": [{"text": "可视化旧金山今天的天气预报"}]}],
  "tools": [{"google_search": {}}],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"],
    "imageConfig": {"aspectRatio": "16:9"}
  }
}

响应中会额外返回 groundingMetadata,包含 webSearchQueries(搜索词)和 groundingChunks(来源链接)。

Flash 还额外支持图片搜索接地,可用网络图片作为视觉参考:

"tools": [{"google_search": {"search_types": {"web_search": {}, "image_search": {}}}}]

思考模式

Pro 模型默认启用思考模式,会先生成构思草图再输出最终图片。响应中 part.thought == true 的为思考过程,可跳过。

Flash 模型支持控制思考等级(minimal 默认 或 high),通过 generationConfig.thinkingConfig 设置:

{
  "contents": [{"parts": [{"text": "A futuristic city inside a glass bottle floating in space"}]}],
  "generationConfig": {
    "responseModalities": ["IMAGE"],
    "imageConfig": {"aspectRatio": "1:1", "imageSize": "1K"},
    "thinkingConfig": {
      "thinkingLevel": "high",
      "includeThoughts": true
    }
  }
}

high 模式下响应 parts 会包含多种类型:

part 类型说明
thought == true + text思考文本(推理过程)
thought == true + inlineData构思草图(临时图片)
inlineData(无 thought)最终输出图片

提取最终图片时跳过 thought parts:

for part in response.candidates[0].content.parts:
    if getattr(part, "thought", False):
        continue  # 跳过思考过程
    if part.get("inlineData"):
        # 这是最终图片
        save(part["inlineData"])

无论 includeThoughts 设为 true 还是 false,思考 token 都会计费。high 模式会消耗更多 token 但图片质量更高。

多张参考图片

Pro 支持最多 6 张对象图 + 5 张人物图(共 14 张);Flash 支持最多 10 张对象图 + 4 张人物图。


注意事项

  1. imageSize 大小写:必须用大写 K1K2K4K),小写 1k 会被拒绝
  2. 512px 仅 flash:Pro 模型不支持 512px 档位
  3. 超时:4K 分辨率生成可能需要 2-5 分钟,务必设置足够的超时
  4. Token 消耗:512px 约 747 token,1K 约 1120 token,4K 约 2000 token
  5. 默认行为:不传 imageConfig 时,默认输出约 1408×768(接近 16:9 的 1K)
  6. 图片格式:响应中 mimeType 通常为 image/png,偶尔为 image/jpeg
  7. SynthID 水印:所有生成图片均包含 SynthID 数字水印
  8. 推荐语言:英语、中文、日语、韩语、法语、德语、西班牙语等