gpt-4o/gpt-3.5-turbo的JSON格式化严格输出方法：Structured Outputs

说在前面

进入使用gpt接口发现即使规定了模型的输出格式为json，但是样本数据太多了中途仍然会有些输出不符合我定义的json输出格式，对于我后续处理output很不友好，故探索了以下gpt的输出格式如何严格规范化json格式。

Structured Outputs ：gpt-4o

官网详细介绍用法：Structured Outputs - OpenAI API

适应模型：gpt-4o-mini, gpt-4o-2024-08-06, and later

测试代码

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class ResearchPaperExtraction(BaseModel):
    title: str
    authors: list[str]
    abstract: str
    keywords: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."},
        {"role": "user", "content": "..."}
    ],
    response_format=ResearchPaperExtraction
)

research_paper = completion.choices[0].message.parsed

JSON Mode：gpt-3.5-turbo

注意：在prompt中必须注明使用json格式输出才能使用

测试代码

from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure. Output JSON format in English."},
        {"role": "user", "content": "..."}
    ],
    response_format={"type": "json_object"}
)

research_paper = completion.choices[0].message.content

并发处理调用gpt-api接口

def request_openai(system_prompt):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": system_prompt}],
        response_format={"type": "json_object"}
    )
    # 获取token信息
    prompt_tokens = response.usage.prompt_tokens
    completion_tokens = response.usage.completion_tokens
    total_tokens = response.usage.total_tokens
    return response.choices[0].message.content, total_tokens
 
 
 
# 使用ThreadPoolExecutor并发处理多个请求
def handle_multiple_requests(prompts):
 
    # Create a ThreadPoolExecutor to process multiple tasks concurrently
    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
 
        # Submit each task for processing
        futures = [executor.submit(request_openai, prompt) for prompt in prompts]
 
        # Process the results as they complete
        for future in concurrent.futures.as_completed(futures):
            try:
                response, tokens = future.result()
 
            except Exception as e:
                print(f"Error processing: {e}")

gpt-4o/gpt-3.5-turbo的JSON格式化严格输出方法：Structured Outputs

说在前面

Structured Outputs ：gpt-4o

测试代码

JSON Mode：gpt-3.5-turbo

测试代码

并发处理调用gpt-api接口

悦读