หน้านี้ได้รับการแปลโดย Cloud Translation API

Gemini Live API

สำหรับแอปพลิเคชันที่ต้องรองรับเสียงแบบเรียลไทม์และมีเวลาในการตอบสนองต่ำ เช่น แชทบอทหรือการโต้ตอบแบบเอเจนต์ Gemini Live API มีวิธีที่ปรับให้เหมาะสม ในการสตรีมทั้งอินพุตและเอาต์พุตสำหรับโมเดล Gemini การใช้ตรรกะ AI ของ Firebase จะช่วยให้คุณเรียกใช้ Gemini Live API จากแอป Android ได้โดยตรงโดยไม่ต้องผสานรวมกับแบ็กเอนด์ คู่มือนี้จะแสดงวิธีใช้ Gemini Live API ในแอป Android ด้วยตรรกะ AI ของ Firebase

เริ่มต้นใช้งาน

ก่อนที่จะเริ่ม โปรดตรวจสอบว่าแอปของคุณกำหนดเป้าหมายเป็น API ระดับ 23 ขึ้นไป

หากยังไม่ได้ดำเนินการ ให้ตั้งค่าโปรเจ็กต์ Firebase และเชื่อมต่อแอปกับ Firebase โปรดดูรายละเอียดในเอกสารประกอบเกี่ยวกับตรรกะ AI ของ Firebase

ตั้งค่าโปรเจ็กต์ Android

เพิ่มทรัพยากร Dependency ของไลบรารีตรรกะ AI ของ Firebase ลงในไฟล์ build.gradle.kts หรือ build.gradle ระดับแอป ใช้ BoM ของ Firebase Android เพื่อจัดการเวอร์ชันของไลบรารี

dependencies {
  // Import the Firebase BoM
  implementation(platform("com.google.firebase:firebase-bom:34.6.0"))
  // Add the dependency for the Firebase AI Logic library
  // When using the BoM, you don't specify versions in Firebase library dependencies
  implementation("com.google.firebase:firebase-ai")
}

หลังจากเพิ่มทรัพยากร Dependency แล้ว ให้ซิงค์โปรเจ็กต์ Android กับ Gradle

ผสานรวม Firebase AI Logic และเริ่มต้นโมเดล Generative

เพิ่มสิทธิ์ RECORD_AUDIO ลงในไฟล์ AndroidManifest.xml ของแอปพลิเคชัน

<uses-permission android:name="android.permission.RECORD_AUDIO" />

เริ่มต้นบริการแบ็กเอนด์ของ Gemini Developer API และเข้าถึง LiveModel ใช้โมเดลที่รองรับ Live API เช่น gemini-live-2.5-flash-preview ดูโมเดลที่ใช้ได้ในเอกสารประกอบของ Firebase

หากต้องการระบุเสียง ให้ตั้งค่าชื่อเสียงภายในออบเจ็กต์ speechConfig เป็นส่วนหนึ่งของการกำหนดค่าโมเดล หากไม่ได้ระบุเสียง ค่าเริ่มต้นจะเป็น Puck

Kotlin

// Initialize the `LiveModel`
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
       modelName = "gemini-live-2.5-flash-preview",
       generationConfig = liveGenerationConfig {
          responseModality = ResponseModality.AUDIO
          speechConfig = SpeechConfig(voice = Voice("FENRIR"))
       })

Java

// Initialize the `LiveModel`
LiveGenerativeModel model = FirebaseAI
       .getInstance(GenerativeBackend.googleAI())
       .liveModel(
              "gemini-live-2.5-flash-preview",
              new LiveGenerationConfig.Builder()
                     .setResponseModality(ResponseModality.AUDIO)
                     .setSpeechConfig(new SpeechConfig(new Voice("FENRIR"))
              ).build(),
        null,
        null
);

คุณสามารถกำหนดลักษณะตัวตนหรือบทบาทที่โมเดลเล่นได้โดยการตั้งค่าคำสั่งของระบบ (ไม่บังคับ) ดังนี้

Kotlin

val systemInstruction = content {
            text("You are a helpful assistant, you main role is [...]")}

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
       modelName = "gemini-live-2.5-flash-preview",
       generationConfig = liveGenerationConfig {
          responseModality = ResponseModality.AUDIO
          speechConfig = SpeechConfig(voice= Voice("FENRIR"))
       },
       systemInstruction = systemInstruction,
)

Java

Content systemInstruction = new Content.Builder()
       .addText("You are a helpful assistant, you main role is [...]")
       .build();

LiveGenerativeModel model = FirebaseAI
       .getInstance(GenerativeBackend.googleAI())
       .liveModel(
              "gemini-live-2.5-flash-preview",
              new LiveGenerationConfig.Builder()
                     .setResponseModality(ResponseModality.AUDIO)
                     .setSpeechConfig(new SpeechConfig(new Voice("FENRIR"))
              ).build(),
        tools, // null if you don't want to use function calling
        systemInstruction
);

คุณสามารถเจาะจงการสนทนากับโมเดลได้มากขึ้นโดยใช้คำสั่งของระบบเพื่อระบุบริบทที่เฉพาะเจาะจงกับแอปของคุณ (เช่น ประวัติกิจกรรมในแอปของผู้ใช้)

เริ่มต้นเซสชัน Live API

เมื่อสร้างอินสแตนซ์ LiveModel แล้ว ให้เรียกใช้ model.connect() เพื่อสร้างออบเจ็กต์ LiveSession และสร้างการเชื่อมต่อแบบถาวรกับโมเดลที่มีการสตรีมแบบมีความหน่วงต่ำ LiveSession ช่วยให้คุณโต้ตอบกับโมเดลได้โดย เริ่มและหยุดเซสชันเสียง รวมถึงส่งและรับข้อความ

จากนั้นคุณสามารถโทรหา startAudioConversation() เพื่อเริ่มการสนทนากับโมเดลได้โดยทำดังนี้

Kotlin

val session = model.connect()
session.startAudioConversation()

Java

LiveModelFutures model = LiveModelFutures.from(liveModel);
ListenableFuture<LiveSession> sessionFuture = model.connect();

Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
        LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

นอกจากนี้ ในการสนทนากับโมเดล โปรดทราบว่าโมเดลจะไม่จัดการการขัดจังหวะ

นอกจากนี้ คุณยังใช้ Gemini Live API เพื่อสร้างเสียงที่สตรีมจากข้อความและสร้างข้อความจากเสียงที่สตรีมได้ด้วย โปรดทราบว่า Live API เป็นแบบ 2 ทิศทาง ดังนั้นคุณจึงใช้การเชื่อมต่อเดียวกันเพื่อส่ง และรับเนื้อหาได้ ในที่สุดคุณจะสามารถส่งรูปภาพและไลฟ์สด วิดีโอไปยังโมเดลได้ด้วย

การเรียกใช้ฟังก์ชัน: เชื่อมต่อ Gemini Live API กับแอปของคุณ

นอกจากนี้ คุณยังเปิดใช้โมเดลให้โต้ตอบกับตรรกะของแอปได้โดยตรงโดยใช้การเรียกใช้ฟังก์ชัน

การเรียกใช้ฟังก์ชัน (หรือการเรียกใช้เครื่องมือ) เป็นฟีเจอร์ของการใช้งาน Generative AI ที่ช่วยให้โมเดลเรียกใช้ฟังก์ชันได้ด้วยตนเองเพื่อดำเนินการ หากฟังก์ชันมีเอาต์พุต โมเดลจะเพิ่มเอาต์พุตนั้นลงในบริบทและ ใช้สำหรับการสร้างในภายหลัง

แผนภาพแสดงวิธีที่ Gemini Live API อนุญาตให้โมเดลตีความพรอมต์ของผู้ใช้
ซึ่งทริกเกอร์ฟังก์ชันที่กำหนดไว้ล่วงหน้าพร้อมอาร์กิวเมนต์ที่เกี่ยวข้อง
ในแอป Android จากนั้นจะได้รับการตอบกลับยืนยันจากโมเดล — **รูปที่ 1:** แผนภาพที่แสดงวิธีที่ Gemini Live API อนุญาตให้โมเดลตีความพรอมต์ของผู้ใช้ ซึ่งจะทริกเกอร์ฟังก์ชันที่กำหนดไว้ล่วงหน้า พร้อมอาร์กิวเมนต์ที่เกี่ยวข้องในแอป Android จากนั้นจะได้รับการตอบกลับยืนยัน จากโมเดล

หากต้องการใช้การเรียกใช้ฟังก์ชันในแอป ให้เริ่มต้นด้วยการสร้างออบเจ็กต์ FunctionDeclaration สำหรับแต่ละฟังก์ชันที่ต้องการแสดงต่อโมเดล

เช่น หากต้องการแสดงaddListฟังก์ชันที่ต่อสตริงกับรายการสตริง ใน Gemini ให้เริ่มด้วยการสร้างตัวแปร FunctionDeclaration ที่มี ชื่อและคำอธิบายสั้นๆ เป็นภาษาอังกฤษธรรมดาของฟังก์ชันและพารามิเตอร์ของฟังก์ชัน

Kotlin

val itemList = mutableListOf<String>()

fun addList(item: String){
   itemList.add(item)
}

val addListFunctionDeclaration = FunctionDeclaration(
        name = "addList",
        description = "Function adding an item the list",
        parameters = mapOf("item" to Schema.string("A short string
            describing the item to add to the list"))
        )

Java

HashMap<String, Schema> addListParams = new HashMap<String, Schema>(1);

addListParams.put("item", Schema.str("A short string describing the item to add to the list"));
addListParams.put("item", Schema.str("A short string describing the item to add to the list"));

FunctionDeclaration addListFunctionDeclaration = new FunctionDeclaration(
    "addList",
    "Function adding an item the list",
    addListParams,
    Collections.emptyList()
);

จากนั้นส่ง FunctionDeclaration นี้เป็น Tool ไปยังโมเดลเมื่อคุณ สร้างอินสแตนซ์

Kotlin

val addListTool = Tool.functionDeclarations(listOf(addListFunctionDeclaration))

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
       modelName = "gemini-live-2.5-flash-preview",
       generationConfig = liveGenerationConfig {
          responseModality = ResponseModality.AUDIO
          speechConfig = SpeechConfig(voice= Voice("FENRIR"))
       },
       systemInstruction = systemInstruction,
       tools = listOf(addListTool)
)

Java

LiveGenerativeModel model = FirebaseAI.getInstance(
    GenerativeBackend.googleAI()).liveModel(
        "gemini-live-2.5-flash-preview",
  new LiveGenerationConfig.Builder()
        .setResponseModalities(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("FENRIR")))
        .build(),
  List.of(Tool.functionDeclarations(List.of(addListFunctionDeclaration))),
               null,
               systemInstruction
        );

สุดท้าย ให้ใช้ฟังก์ชันแฮนเดิลเพื่อจัดการการเรียกใช้เครื่องมือที่โมเดลสร้างขึ้น และส่งการตอบกลับกลับไป ฟังก์ชันแฮนเดิลนี้ที่ระบุใน LiveSession เมื่อคุณเรียกใช้ startAudioConversation จะใช้พารามิเตอร์ FunctionCallPart และแสดงผล FunctionResponsePart

Kotlin

session.startAudioConversation(::functionCallHandler)

// ...

fun functionCallHandler(functionCall: FunctionCallPart): FunctionResponsePart {
    return when (functionCall.name) {
        "addList" -> {
            // Extract function parameter from functionCallPart
            val itemName = functionCall.args["item"]!!.jsonPrimitive.content
            // Call function with parameter
            addList(itemName)
            // Confirm the function call to the model
            val response = JsonObject(
                mapOf(
                    "success" to JsonPrimitive(true),
                    "message" to JsonPrimitive("Item $itemName added to the todo list")
                )
            )
            FunctionResponsePart(functionCall.name, response)
        }
        else -> {
            val response = JsonObject(
                mapOf(
                    "error" to JsonPrimitive("Unknown function: ${functionCall.name}")
                )
            )
            FunctionResponsePart(functionCall.name, response)
        }
    }
}

Java

Futures.addCallback(sessionFuture, new FutureCallback<LiveSessionFutures>() {

    @RequiresPermission(Manifest.permission.RECORD_AUDIO)
    @Override
    @OptIn(markerClass = PublicPreviewAPI.class)
    public void onSuccess(LiveSessionFutures ses) {
        ses.startAudioConversation(::handleFunctionCallFuture);
    }

    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

// ...

ListenableFuture<JsonObject> handleFunctionCallFuture = Futures.transform(response, result -> {
    for (FunctionCallPart functionCall : result.getFunctionCalls()) {
        if (functionCall.getName().equals("addList")) {
            Map<String, JsonElement> args = functionCall.getArgs();
            String item =
                    JsonElementKt.getContentOrNull(
                            JsonElementKt.getJsonPrimitive(
                                    locationJsonObject.get("item")));
            return addList(item);
        }
    }
    return null;
}, Executors.newSingleThreadExecutor());

ขั้นตอนถัดไป

ลองใช้ Gemini Live API ในแอปตัวอย่างแคตตาล็อก AI ของ Android
อ่านข้อมูลเพิ่มเติมเกี่ยวกับ Gemini Live API ได้ในเอกสารประกอบเกี่ยวกับตรรกะ AI ของ Firebase
ดูข้อมูลเพิ่มเติมเกี่ยวกับโมเดล Gemini ที่พร้อมใช้งาน
ดูข้อมูลเพิ่มเติมเกี่ยวกับการเรียกใช้ฟังก์ชัน
สํารวจกลยุทธ์การออกแบบพรอมต์

Gemini Live API จัดทุกอย่างให้เป็นระเบียบอยู่เสมอด้วยคอลเล็กชัน บันทึกและจัดหมวดหมู่เนื้อหาตามค่ากำหนดของคุณ

เริ่มต้นใช้งาน

ตั้งค่าโปรเจ็กต์ Android

ผสานรวม Firebase AI Logic และเริ่มต้นโมเดล Generative

Kotlin

Java

Kotlin

Java

เริ่มต้นเซสชัน Live API

Kotlin

Java

การเรียกใช้ฟังก์ชัน: เชื่อมต่อ Gemini Live API กับแอปของคุณ

Kotlin

Java

Kotlin

Java

Kotlin

Java

ขั้นตอนถัดไป

Gemini Live API