指南

并行化请求

使用 xAI API 时，您可能需要处理数百甚至数千个请求。按顺序发送这些请求可能非常耗时。grok.cadn.net.cn

为了提高效率，您可以使用AsyncOpenAI从openaiSDK，支持同时发送多个请求。下面的示例是一个 Python 脚本，演示了如何使用AsyncOpenAI异步批处理和处理请求，从而显著减少总体执行时间：grok.cadn.net.cn

xAI API 当前不提供批处理 API。

速率限制

调整max_concurrentparam 来控制并行请求的最大数量。

您无法在 API 控制台中显示的速率限制之外并行化请求。

import asyncio
import os
from asyncio import Semaphore
from typing import List

from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key=os.getenv("XAI_API_KEY"),
    base_url="https://api.x.ai/v1"
)

async def send_request(sem: Semaphore, request: str) -> dict:
    """Send a single request to xAI with semaphore control."""
    # The 'async with sem' ensures only a limited number of requests run at once
    async with sem:
        return await client.chat.completions.create(
            model="grok-2-latest",
            messages=[{"role": "user", "content": request}]
        )

async def process_requests(requests: List[str], max_concurrent: int = 2) -> List[dict]:
    """Process multiple requests with controlled concurrency."""
    # Create a semaphore that limits how many requests can run at the same time
    # Think of it like having only 2 "passes" to make requests simultaneously
    sem = Semaphore(max_concurrent)
    
    # Create a list of tasks (requests) that will run using the semaphore
    tasks = [send_request(sem, request) for request in requests]
    
    # asyncio.gather runs all tasks in parallel but respects the semaphore limit
    # It waits for all tasks to complete and returns their results
    return await asyncio.gather(*tasks)

async def main() -> None:
    """Main function to handle requests and display responses."""
    requests = [
        "Tell me a joke",
        "Write a funny haiku",
        "Generate a funny X post",
        "Say something unhinged"
    ]

    # This starts processing all requests in parallel, but only 2 at a time
    # Instead of waiting for each request to finish before starting the next,
    # we can have 2 requests running at once, making it faster overall
    responses = await process_requests(requests)
    
    # Print each response in order
    for i, response in enumerate(responses):
        print(f"# Response {i}:")
        print(response.choices[0].message.content)

if __name__ == "__main__":
    asyncio.run(main())

指南

#并行化请求

#速率限制

并行化请求

速率限制