说在前面
进入使用gpt接口发现即使规定了模型的输出格式为json,但是样本数据太多了中途仍然会有些输出不符合我定义的json输出格式,对于我后续处理output很不友好,故探索了以下gpt的输出格式如何严格规范化json格式。
Structured Outputs :gpt-4o
官网详细介绍用法:Structured Outputs - OpenAI API
适应模型:gpt-4o-mini
, gpt-4o-2024-08-06
, and later
测试代码
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class ResearchPaperExtraction(BaseModel):
title: str
authors: list[str]
abstract: str
keywords: list[str]
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."},
{"role": "user", "content": "..."}
],
response_format=ResearchPaperExtraction
)
research_paper = completion.choices[0].message.parsed
JSON Mode:gpt-3.5-turbo
注意:在prompt中必须注明使用json格式输出才能使用
测试代码
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure. Output JSON format in English."},
{"role": "user", "content": "..."}
],
response_format={"type": "json_object"}
)
research_paper = completion.choices[0].message.content
并发处理调用gpt-api接口
def request_openai(system_prompt):
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": system_prompt}],
response_format={"type": "json_object"}
)
# 获取token信息
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
total_tokens = response.usage.total_tokens
return response.choices[0].message.content, total_tokens
# 使用ThreadPoolExecutor并发处理多个请求
def handle_multiple_requests(prompts):
# Create a ThreadPoolExecutor to process multiple tasks concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
# Submit each task for processing
futures = [executor.submit(request_openai, prompt) for prompt in prompts]
# Process the results as they complete
for future in concurrent.futures.as_completed(futures):
try:
response, tokens = future.result()
except Exception as e:
print(f"Error processing: {e}")