使用 NVIDIA NIM 构建您的第一个人机协同 AI 代理

NVIDIA 开发者计划

想要了解有关 NIM 的更多信息？加入 NVIDIA 开发者计划，即可免费访问任何基础设施云、数据中心或个人工作站上最多 16 个 GPU 上的自托管 NVIDIA NIM 和微服务。

加入免费的 NVIDIA 开发者计划后，您可以随时通过 NVIDIA API 目录访问 NIM。要获得企业级安全性、支持和 API 稳定性，请选择通过我们的免费 90 天 NVIDIA AI Enterprise 试用版使用企业电子邮件地址访问 NIM 的选项。

AI 技术的一项令人振奋的突破——视觉语言模型 (VLM)——为视频分析提供了一种更加动态和灵活的方法。VLM 使用户能够使用自然语言与图像和视频输入进行交互，从而使该技术更易于访问和适应。这些模型可以在 NVIDIA Jetson Orin 边缘 AI 平台或通过 NIM 在独立 GPU 上运行。这篇博文探讨了如何构建可以从边缘运行到云的基于 VLM 的可视化 AI 智能体。

由大型语言模型 (LLM) 驱动的 AI 代理可帮助组织简化和减少手动工作量。这些代理使用多级迭代推理来分析问题、设计解决方案并使用各种工具执行任务。与传统聊天机器人不同，由 LLM 驱动的代理通过有效理解和处理信息来自动执行复杂任务。为了避免特定应用中的潜在风险，在使用自主 AI 代理时，保持人工监督仍然至关重要。

在本文中，您将学习如何使用 NVIDIA NIM 微服务（一种针对 AI 推理优化的加速 API）构建人机协同 AI 代理。本文介绍了一个社交媒体用例，展示了这些多功能 AI 代理如何轻松处理复杂任务。借助 NIM 微服务，您可以将高级 LLM 无缝集成到您的工作流程中，从而提供 AI 驱动任务所需的可扩展性和灵活性。无论您是创建促销内容还是自动化复杂的工作流程，本教程都旨在加速您的流程。

要观看演示，请观看如何使用 NVIDIA NIM 在 5 分钟内构建一个简单的 AI 代理。

为个性化社交媒体内容构建 AI 代理

当今营销人员面临的最大挑战之一是跨平台生成高质量、有创意的促销内容。目标是创建可在社交媒体上发布的各种促销信息和艺术作品。

传统上，项目负责人将这些任务分配给内容作者和数字艺术家等专家。但如果 AI 代理可以帮助提高这一过程的效率会怎样？

此用例涉及两个 AI 代理——内容创建者代理和数字艺术家代理。这些 AI 代理将生成促销内容并将其提交给人类决策者进行最终批准，确保人类控制仍然是创作过程的核心。

构建人机代理决策工作流程

构建此人机联动系统涉及创建认知工作流程，其中 AI 代理协助完成特定任务，而人类执行最终决策。下图概述了人类决策者和代理之间的互动。
!()[https://developer-blogs.nvidia.com/wp-content/uploads/2024/10/human-ai-agent-interaction-conceptual-architecutre.png]

内容创建者代理使用 Llama 3.1 405B 模型，由 NVIDIA LLM NIM 微服务加速。LangChain ChatNVIDIA 还集成了 NIM 函数调用和结构化输出，以确保有条理、可靠的结果。ChatNVIDIA 是 NVIDIA 为 LangChain 贡献的开源 Python 库，使开发人员能够轻松连接 NVIDIA NIM。这些组合功能被整合到 LangChain 可运行链 (LCEL) 表达式中，从而创建了一个强大的代理工作流程。

构建内容创建者代理

首先构建内容创建者代理。此代理使用 NVIDIA API 目录预览 API 端点，按照特定的格式指南生成促销信息。NVIDIA AI Enterprise 客户还可以本地下载和运行 NIM 端点。

使用以下 Python 代码开始：

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain import prompts, chat_models, hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from typing import Optional, List
 
 
## 1. construct the system prompt ---------
prompt_template = """
### [INST]
 
 
You are an expert social media content creator.
Your task is to create a different promotion message with the following 
Product Description :
------
{product_desc}
------
The output promotion message MUST use the following format :
'''
Title: a powerful, short message that dipict what this product is about 
Message: be creative for the promotion message, but make it short and ready for social media feeds.
Tags: the hash tag human will nomally use in social media
'''
Begin!
[/INST]
 """
prompt = PromptTemplate(
input_variables=['produce_desc'],
template=prompt_template,
)
 
 
## 2. provide seeded product_desc text
product_desc="Explore the latest community-built AI models with an API optimized and accelerated by NVIDIA, then deploy anywhere with NVIDIA NIM™ inference microservices."
 
 
## 3. structural output using LMFE 
class StructureOutput(BaseModel):     
    Title: str = Field(description="Title of the promotion message")
    Message : str = Field(description="The actual promotion message")
    Tags: List[str] = Field(description="Hashtags for social media, usually starts with #")
 
 
## 4. A powerful LLM 
llm_with_output_structure=ChatNVIDIA(model="meta/llama-3.1-405b-instruct").with_structured_output(StructureOutput)     
 
 
## construct the content_creator agent
content_creator = ( prompt | llm_with_output_structure )
out=content_creator.invoke({"product_desc":product_desc})

使用数字艺术家代理

接下来，我们介绍数字艺术家代理，它使用 NVIDIA sdXL-turbo 文本转图像模型将宣传文字转换为创意视觉效果。此代理重写输入查询并生成专为社交媒体推广活动设计的高质量图像。以下代码提供了代理如何集成的示例：

import requests
import base64, io
from PIL import Image
import requests, json
def generate_image(prompt :str) -> str :
    """
    generate image from text
    Args:
        prompt: input text
    """
    ## re-writing the input promotion title in to appropriate image_gen prompt 
    gen_prompt=llm_rewrite_to_image_prompts(prompt)
    print("start generating image with llm re-write prompt:", gen_prompt)
    invoke_url = "https://ai.api.nvidia.com/v1/genai/stabilityai/sdxl-turbo"
     
    headers = {
        "Authorization": f"Bearer {nvapi_key}",
        "Accept": "application/json",
    }
     
    payload = {
        "text_prompts": [{"text": gen_prompt}],
        "seed": 0,
        "sampler": "K_EULER_ANCESTRAL",
        "steps": 2
    }
     
    response = requests.post(invoke_url, headers=headers, json=payload)
     
    response.raise_for_status()
    response_body = response.json()
    ## load back to numpy array 
    print(response_body['artifacts'][0].keys())
    imgdata = base64.b64decode(response_body["artifacts"][0]["base64"])
    filename = 'output.jpg'
    with open(filename, 'wb') as f:
        f.write(imgdata)   
    im = Image.open(filename)  
    img_location=f"the output of the generated image will be stored in this path : {filename}"
    return img_location

使用以下 Python 脚本将用户输入查询重写为图像生成提示：

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain import prompts, chat_models, hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
 
 
def llm_rewrite_to_image_prompts(user_query):
    prompt = prompts.ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Summarize the following user query into a very short, one-sentence theme for image generation, MUST follow this format : A iconic, futuristic image of , no text, no amputation, no face, bright, vibrant",
            ),
            ("user", "{input}"),
        ]
    )
    model = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
    chain = ( prompt    | model   | StrOutputParser() )
    out= chain.invoke({"input":user_query})
    #print(type(out))
    return out}

接下来，将图像生成绑定到选定的 LLM 中，并将其包装在 LCEL 中以创建数字艺术家代理：

## bind image generation as tool into llama3.1-405b llm
llm=ChatNVIDIA(model="meta/llama-3.1-405b-instruct")
llm_with_img_gen_tool=llm.bind_tools([generate_image],tool_choice="generate_image")
## use LCEL to construct Digital Artist Agent
digital_artist = (
    llm_with_img_gen_tool
    | output_to_invoke_tools
)

将人机交互与决策者的角色相结合

为了保持人类的监督，代理将共享其输出以供最终批准。人类决策者将审查内容创建者代理生成的文本和数字艺术家代理制作的艺术作品。

这种互动允许多次迭代，确保宣传信息和图像都经过完善并准备好部署。

代理逻辑将人类作为决策者置于中心，为每个任务分配适当的代理。LangGraph 用于协调代理认知架构。

这涉及一个要求人类输入的功能：

# Or you can directly instantiate the tool
from langchain_community.tools import HumanInputRun
from langchain.agents import AgentType, load_tools
from langchain.agents import AgentType, initialize_agent, load_tools
 
 
def get_human_input() -> str:
    """ Put human as decision maker, human will decide which agent is best for the task"""
    print("You have been given 2 agents. Please select exactly _ONE_ agent to help you with the task, enter 'y' to confirm your choice.")
    print("""Available agents are : \n
            1 ContentCreator  \n
            2 DigitalArtist \n          
            Enter 1 or 2""")
    contents = []
    while True:
        try:            
            line = input()
            if line=='1':
                tool="ContentCreator"               
                line=tool                
            elif line=='2':
                tool="DigitalArtist"               
                line=tool                
            else:
                pass
             
        except EOFError:
            break
        if line == "y":
            print(f"tool selected : {tool} ")
            break
        contents.append(line)       
    return "\n".join(contents)
 
 
# You can modify the tool when loading
 
 
ask_human = HumanInputRun(input_func=get_human_input)

接下来，创建两个额外的 Python 函数作为图形节点，LangGraph 使用这些节点来表示工作流中的步骤或操作。这些节点使代理能够按顺序或并行执行特定任务，从而创建灵活且结构化的流程：

from langgraph.graph import END, StateGraph
from langgraph.prebuilt import ToolInvocation
from colorama  import Fore,Style
# Define the functions needed 
def human_assign_to_agent(state):
    # ensure using original prompt 
    inputs = state["input"]
    input_to_agent = state["input_to_agent"]
    concatenate_str = Fore.BLUE+inputs+ ' : '+Fore.CYAN+input_to_agent + Fore.RESET
    print(concatenate_str)
    print("---"*10)  
    agent_choice=ask_human.invoke(concatenate_str)
    print(Fore.CYAN+ "choosen_agent : " + agent_choice + Fore.RESET)
    return {"agent_choice": agent_choice }
 
 
def agent_execute_task(state):    
    inputs= state["input"]
    input_to_agent = state["input_to_agent"]
    print(Fore.CYAN+input_to_agent + Fore.RESET)
    # choosen agent will execute the task
    choosen_agent = state['agent_choice']
    if choosen_agent=='ContentCreator':
        structured_respond=content_creator.invoke({"product_desc":input_to_agent})
        respond='\n'.join([structured_respond.Title,structured_respond.Message,''.join(structured_respond.Tags)])       
    elif choosen_agent=="DigitalArtist":
        respond=digital_artist.invoke(input_to_agent)
    else:
        respond="please reselect the agent, there are only 2 agents available: 1.ContentCreator or 2.DigitalArtist"
     
    print(Fore.CYAN+ "agent_output: \n" + respond + Fore.RESET)
    return {"agent_use_tool_respond": respond}

最后，通过连接节点和边将所有内容整合在一起，形成人机协同多智能体工作流。图表编译完成后，您就可以继续了：

from langgraph.graph import END, StateGraph
 
 
# Define a new graph
workflow = StateGraph(State)
 
 
# Define the two nodes 
workflow.add_node("start", human_assign_to_agent)
workflow.add_node("end", agent_execute_task)
 
 
# This means that this node is the first one called
workflow.set_entry_point("start")
workflow.add_edge("start", "end")
workflow.add_edge("end", END)
 
 
# Finally, we compile it!
# This compiles it into a LangChain Runnable,
# meaning you can use it as you would any other runnable
app = workflow.compile()

启动人工代理工作流程

现在，启动应用程序。它会提示您为给定任务分配一个可用代理。

编写促销文字的提示

首先，查询内容创建者代理以编写促销文字，包括标题、消息和社交媒体标签（下图）。重复此操作，直到对输出满意为止。

A Python code sample:

my_query="create a good promotional message for social promotion events using the following inputs"
product_desc="NVIDIA NIM microservices power GenAI workflow"
respond=app.invoke({"input":my_query, "input_to_agent":product_desc})

人类选择 1 = 内容创建者代理来完成任务。代理执行并返回 agent_output，如下图所示。

创建插图的提示

对结果满意后，继续查询数字艺术家代理以创建用于社交媒体推广的艺术品（下图）。

以下 Python 代码示例使用内容创建者代理生成的标题作为图像提示的输入：

## taken the output from the Title from the output of Content Creator Agent 
prompt_for_image=respond['agent_use_tool_respond'].split('\n')[0].split(':')[-1].strip()
## Human decision maker give instruction to the agent workflow app
input_query="generate an image for me from the below promotion message"
respond2=app.invoke({"input":input_query, "input_to_agent":prompt_for_image})

生成的图像保存为output.jpg。

迭代以获得高质量结果

您可以迭代生成的图像以获得不同的艺术作品变体，从而获得您想要的结果。从内容创建者代理中稍微调整输入提示可以从数字艺术家代理中产生不同的图像。