在利用代理群聊协作解决复杂问题时,往往需要指定某一代理负责从特定知识库中检索信息,从而为群聊提供精准且可靠的知识来源。在此示例中,我们采用了一种群聊对话的形式,并引用了论文《https://arxiv.org/html/2408.08921v1》中的研究内容,探讨了大语言模型(LLMs)、检索增强生成(RAG)以及基于图的检索增强生成(GraphRAG)这三种方法之间的差异及其各自的优缺点。接下来,请参阅相关代码:
第一步,加载必要库
!pip install "pyautogen[retrievechat]"
#!sudo apt-get update
#!sudo apt-get install -y tesseract-ocr poppler-utils
!pip install unstructured[all-docs]
!pip install sentence_transformers
第二步,处理huggingface访问的网络问题
!sed -i 's/huggingface.co/hf-mirror.com/g' /usr/local/lib/python3.10/site-packages/huggingface_hub/constants.py
!sed -i 's/huggingface.co/hf-mirror.com/g' /usr/local/lib/python3.10/site-packages/transformers/utils/hub.py
第三步,配置代理终端
import chromadb
from typing_extensions import Annotated
import os
import autogen
from autogen import AssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
config_list = autogen.config_list_from_json("/mnt/workspace/OAI_CONFIG_LIST",
filter_dict={"model": ["qwen2-72b-instruct"]},
#filter_dict={"model": ["gemma-2b-it.Q3_K_L"]},
)
print("LLM models: ", [config_list[i]["model"] for i in range(len(config_list))])
第四步,创建代理
其中RetrieveUserProxyAgent为增强检索代理,负责检索《https://arxiv.org/html/2408.08921v1》中的研究内容。
def termination_msg(x):
return isinstance(x, dict) and "TERMINATE" == str(x.get("content", ""))[-9:].upper()
llm_config = {"config_list": config_list, "timeout": 60, "temperature": 0.8, "seed": 1234}
boss = autogen.UserProxyAgent(
name="Boss",
is_termination_msg=termination_msg,
human_input_mode="NEVER",
code_execution_config=False, # we don't want to execute code in this case.
default_auto_reply="Reply `TERMINATE` if the task is done.",
description="The boss who ask questions and give tasks.",
)
boss_aid = RetrieveUserProxyAgent(
name="Boss_Assistant",
is_termination_msg=termination_msg,
human_input_mode="NEVER",
default_auto_reply="Reply `TERMINATE` if the task is done.",
max_consecutive_auto_reply=3,
retrieve_config={
"task": "code",
#"docs_path": "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md",
#"docs_path": "https://blog.csdn.net/weixin_44458771/article/details/135495928",
#"docs_path": ["/mnt/workspace/codetable.txt","/mnt/workspace/SouthGermanCredit.txt",],
"docs_path": ["https://arxiv.org/html/2408.08921v1",],
#"docs_path": ["/mnt/workspace/autogen教程.docx",],
"chunk_token_size": 2000,
"model":"sentence-transformers/facebook-dpr-question_encoder-single-nq-base",
# "client": chromadb.PersistentClient(path="/tmp/chromadb"), # 已弃用,请使用 "vector_db"
"vector_db": "chroma", # 要使用已弃用的 `client` 参数,请将其设置为 None,并取消上面一行的注释
"overwrite": True, # 如果要覆盖现有的集合,请将其设置为 True。两个及以上docs设置为True会覆盖
},
code_execution_config=False, # we don't want to execute code in this case.
description="Assistant who has extra content retrieval power for solving difficult problems.",
)
coder = AssistantAgent(
name="Senior_Python_Engineer",
is_termination_msg=termination_msg,
system_message="You are a senior python engineer, you provide python code to answer questions. Reply `TERMINATE` in the end when everything is done.",
llm_config=llm_config,
description="Senior Python Engineer who can write code to solve problems and answer questions.",
)
pm = autogen.AssistantAgent(
name="Product_Manager",
is_termination_msg=termination_msg,
system_message="You are a product manager. Reply `TERMINATE` in the end when everything is done.",
llm_config=llm_config,
description="Product Manager who can design and plan the project.",
)
reviewer = autogen.AssistantAgent(
name="Code_Reviewer",
is_termination_msg=termination_msg,
system_message="You are a code reviewer. Reply `TERMINATE` in the end when everything is done.",
llm_config=llm_config,
description="Code Reviewer who can review the code.",
)
第五步,创建代理群,定义聊天开启模式
def _reset_agents():
boss.reset()
boss_aid.reset()
coder.reset()
pm.reset()
reviewer.reset()
def call_rag_chat(question: str):
_reset_agents()
# In this case, we will have multiple user proxy agents and we don't initiate the chat
# with RAG user proxy agent.
# In order to use RAG user proxy agent, we need to wrap RAG agents in a function and call
# it from other agents.
def retrieve_content(
message: Annotated[
str,
"Refined message which keeps the original meaning and can be used to retrieve content for code generation and question answering.",
],
n_results: Annotated[int, "number of results"] = 3,
) -> str:
boss_aid.n_results = n_results # Set the number of results to be retrieved.
_context = {"problem": message, "n_results": n_results}
ret_msg = boss_aid.message_generator(boss_aid, None, _context)
return ret_msg or message
boss_aid.human_input_mode = "NEVER" # Disable human input for boss_aid since it only retrieves content.
for caller in [pm, coder, reviewer]:
d_retrieve_content = caller.register_for_llm(
description="retrieve content for code generation and question answering.",
)(retrieve_content)
for executor in [boss, pm]:
executor.register_for_execution()(retrieve_content)
groupchat = autogen.GroupChat(
agents=[boss, pm, coder, reviewer],
messages=[],
max_round=12,
speaker_selection_method="round_robin",
allow_repeat_speaker=False,
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
# Start chatting with the boss as this is the user proxy agent.
boss.initiate_chat(
manager,
message=question,
)
第六步,开启聊天内容
PROBLEM="解释下LLM、RAG和GraphRAG三种方式的不同和优缺点。"
call_rag_chat(PROBLEM)
日志
Boss (to chat_manager):
解释下LLM、RAG和GraphRAG三种方式的不同和优缺点。
--------------------------------------------------------------------------------
Next speaker: Product_Manager
/usr/local/lib/python3.10/site-packages/autogen/agentchat/conversable_agent.py:2567: UserWarning: Function 'retrieve_content' is being overridden.
warnings.warn(f"Function '{tool_sig['function']['name']}' is being overridden.", UserWarning)
/usr/local/lib/python3.10/site-packages/autogen/agentchat/conversable_agent.py:2486: UserWarning: Function 'retrieve_content' is being overridden.
warnings.warn(f"Function '{name}' is being overridden.", UserWarning)
Product_Manager (to chat_manager):
LLM (Large Language Model), RAG (Retrieval-Augmented Generation), 和 GraphRAG都是处理自然语言处理(NLP)任务的技术,但它们各有侧重和不同的应用方式。
1. **LLM (Large Language Model)**:
- **描述**: LLM是指具有大量参数的预训练模型,如GPT-3或BERT等,这些模型通过在大规模语料库上进行无监督学习,学会了理解和生成自然语言。
- **优点**: 能够生成连贯、多样化的文本;在多种NLP任务中表现出色,无需针对特定任务进行额外训练。
- **缺点**: 由于缺乏对特定领域知识的深入理解,其输出可能不够精确或专业;此外,训练和运行成本高。
2. **RAG (Retrieval-Augmented Generation)**:
- **描述**: RAG结合了检索技术与语言模型,首先从外部知识源检索相关的信息片段,然后使用这些信息来增强语言模型的生成过程,使输出更加准确且基于事实。
- **优点**: 可以利用结构化或非结构化的外部知识,提高生成内容的准确性和可信度。
- **缺点**: 检索的质量直接影响到生成的质量,如果检索出的信息不准确或不相关,可能会误导生成结果;另外,增加了系统的复杂性。
3. **GraphRAG (Graph-based Retrieval-Augmented Generation)**:
- **描述**: GraphRAG进一步扩展了RAG的概念,它使用图数据库或图神经网络来检索和整合知识,这使得系统能够更好地理解和利用实体之间的关系,以及更复杂的知识结构。
- **优点**: 更好地处理实体间的关系和语义链接,对于需要理解上下文和推理的任务表现更佳。
- **缺点**: 实现起来更为复杂,需要高质量的图数据和关系标注,且计算资源需求更高。
每种方法都有其适用场景,选择哪一种取决于具体的应用需求、可用的数据资源和计算能力。
***** Suggested tool call (call_aafc254cb45b4ec4b1e120): retrieve_content *****
Arguments:
{"message": "compare LLM, RAG, and GraphRAG", "n_results": 3}
*******************************************************************************
--------------------------------------------------------------------------------
Next speaker: Boss
>>>>>>>> EXECUTING FUNCTION retrieve_content...
Model sentence-transformers/facebook-dpr-question_encoder-single-nq-base not found. Using cl100k_base encoding.
VectorDB returns doc_ids: [['92f4046b', 'f4c97d12', '89a5f15d']]
Adding content of doc 92f4046b to context.
Model sentence-transformers/facebook-dpr-question_encoder-single-nq-base not found. Using cl100k_base encoding.
Boss (to chat_manager):
Boss (to chat_manager):
***** Response from calling tool (call_aafc254cb45b4ec4b1e120) *****
You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
For code generation, you must obey the following rules:
Rule 1. You MUST NOT install any packages because all the packages needed are already installed.
Rule 2. You must follow the formats below to write your code:
```language
# your code
```
User's question is: compare LLM, RAG, and GraphRAG
Context is: Although LLMs are primarily designed to process pure text and struggle with non\-Euclidean data containing complex structural information, such as graphs (Wang et al., [2023b](https://arxiv.org/html/2408.08921v1#bib.bib154); Guo et al., [2023](https://arxiv.org/html/2408.08921v1#bib.bib42)), numerous studies (Fan et al., [2024b](https://arxiv.org/html/2408.08921v1#bib.bib29); Mao et al., [2024c](https://arxiv.org/html/2408.08921v1#bib.bib106); Liu et al., [2024c](https://arxiv.org/html/2408.08921v1#bib.bib93); Li et al., [2024c](https://arxiv.org/html/2408.08921v1#bib.bib84); Pan et al., [2023](https://arxiv.org/html/2408.08921v1#bib.bib120), [2024](https://arxiv.org/html/2408.08921v1#bib.bib121); Jin et al., [2024a](https://arxiv.org/html/2408.08921v1#bib.bib66); Chen, [2024](https://arxiv.org/html/2408.08921v1#bib.bib14); Zhu et al., [2024](https://arxiv.org/html/2408.08921v1#bib.bib190); Wang et al., [2024e](https://arxiv.org/html/2408.08921v1#bib.bib162)) have been conducted in these fields. These papers primarily integrate LLMs with GNNs to enhance modeling capabilities for graph data, thereby improving performance on downstream tasks such as node classification, edge prediction, graph classification, and others. For example, Zhu et al. ([2024](https://arxiv.org/html/2408.08921v1#bib.bib190)) propose an efficient fine\-tuning method named ENGINE, which combines LLMs and GNNs through a side structure for enhancing graph representation.
Different from these methods, GraphRAG focuses on retrieving relevant graph elements using queries from an external graph\-structured database. In this paper, we provide a detailed introduction to the relevant technologies and applications of GraphRAG, which are not included in previous surveys of LLMs on Graphs.
### 2\.3\. KBQA
KBQA is a significant task in natural language processing, aiming to respond to user queries based on external knowledge bases (Fu et al., [2020](https://arxiv.org/html/2408.08921v1#bib.bib34); Lan et al., [2023](https://arxiv.org/html/2408.08921v1#bib.bib78), [2021](https://arxiv.org/html/2408.08921v1#bib.bib77); Yani and Krisnadhi, [2021](https://arxiv.org/html/2408.08921v1#bib.bib175)), thereby achieving goals such as fact verification, passage retrieval enhancement, and text understanding. Previous surveys typically categorize existing KBQA approaches into two main types: Information Retrieval (IR)\-based methods and Semantic Parsing (SP)\-based methods. Specifically, IR\-based methods (Luo et al., [2024b](https://arxiv.org/html/2408.08921v1#bib.bib103); Sun et al., [2024b](https://arxiv.org/html/2408.08921v1#bib.bib143); Jiang et al., [2023b](https://arxiv.org/html/2408.08921v1#bib.bib62); Zhang et al., [2022b](https://arxiv.org/html/2408.08921v1#bib.bib182); Wu et al., [2023b](https://arxiv.org/html/2408.08921v1#bib.bib169); Wang et al., [2023a](https://arxiv.org/html/2408.08921v1#bib.bib156); Jiang et al., [2024b](https://arxiv.org/html/2408.08921v1#bib.bib61)) retrieve information related to the query from the knowledge graph (KG) and use it to enhance the generation process. While SP\-based methods (Chakraborty, [2024](https://arxiv.org/html/2408.08921v1#bib.bib13); Fang et al., [2024](https://arxiv.org/html/2408.08921v1#bib.bib30); Chen et al., [2021](https://arxiv.org/html/2408.08921v1#bib.bib16); Ye et al., [2021](https://arxiv.org/html/2408.08921v1#bib.bib178); Gu and Su, [2022](https://arxiv.org/html/2408.08921v1#bib.bib41); Sun et al., [2023](https://arxiv.org/html/2408.08921v1#bib.bib142)) generate a logical form (LF) for each query and execute it against knowledge bases to obtain the answer.
GraphRAG and KBQA are closely related, with IR\-based KBQA methods representing a subset of GraphRAG approaches focused on downstream applications. In this work, we extend the discussion beyond KBQA to include GraphRAG’s applications across various downstream tasks. Our survey provides a thorough and detailed exploration of GraphRAG technology, offering a comprehensive understanding of existing methods and potential improvements.
3\. Preliminaries
-----------------
In this section, we introduce background knowledge of GraphRAG for easier comprehension of our survey. First, we introduce Text\-Attributed Graphs which is a universal and general format of graph data used in GraphRAG. Then, we provide formal definitions for two types of models that can be used in the retrieval and generation stages: Graph Neural Networks and Language Models.
### 3\.1\. Text\-Attributed Graphs
The graph data used in Graph RAG can be represented uniformly as Text\-Attributed Graphs (TAGs), where nodes and edges possess textual attributes. Formally, a text\-attributed graph can be denoted as 𝒢\=(𝒱,ℰ,𝒜,{𝐱v}v∈𝒱,{𝐞i,j}i,j∈ℰ)𝒢𝒱ℰ𝒜subscriptsubscript𝐱𝑣𝑣𝒱subscriptsubscript𝐞𝑖𝑗𝑖𝑗
ℰ\\mathcal{G}\=(\\mathcal{V},\\mathcal{E},\\mathcal{A},\\{\\mathbf{x}\_{v}\\}\_{v\\in%
********************************************************************
--------------------------------------------------------------------------------
Next speaker: Product_Manager
Product_Manager (to chat_manager):
The retrieved context further clarifies some aspects of LLMs, RAG, and GraphRAG, especially focusing on how they relate to graph data and knowledge base question answering (KBQA). Here are the additional insights:
- **LLMs on Graphs**: While Large Language Models (LLMs) are great at processing text, they struggle with complex structural information like graphs. However, there are methods that combine LLMs with Graph Neural Networks (GNNs) to better model graph data, improving performance on tasks like node classification and edge prediction.
- **GraphRAG**: Unlike methods that combine LLMs and GNNs, GraphRAG focuses on retrieving relevant graph elements from an external graph-structured database. It enhances the retrieval process in knowledge-based question answering (KBQA), aiming to improve the understanding and utilization of entities and relationships in the data.
- **KBQA**: KBQA methods can be categorized into Information Retrieval (IR)-based and Semantic Parsing (SP)-based. IR-based methods retrieve information related to the query from the knowledge graph and use it to enhance the generation process. SP-based methods generate a logical form for each query and execute it against knowledge bases to obtain answers. GraphRAG, specifically IR-based KBQA methods, represents a subset of approaches aimed at downstream applications.
To summarize, while LLMs excel in text processing, their limitations in handling graph data are addressed by integrating them with GNNs or using GraphRAG for more structured retrieval. RAG, in its general form, can be seen as a precursor to GraphRAG, focusing on augmenting generation with retrieved information, which GraphRAG extends by specifically targeting graph data for retrieval.
This information underscores the importance of choosing the right approach based on the type of data and the complexity of the relationships involved in the task at hand.
--------------------------------------------------------------------------------
Next speaker: Senior_Python_Engineer
Senior_Python_Engineer (to chat_manager):
Indeed, the additional context provides deeper insight into how these technologies interact with graph data and knowledge bases. Here's a concise recap of the three methodologies:
1. **LLMs (Large Language Models)**:
- **Focus**: Primarily designed for processing plain text.
- **Strengths**: Exceptional in generating human-like text and performing various NLP tasks.
- **Limitations**: Struggle with handling complex structural data such as graphs, which require more specialized treatment.
2. **RAG (Retrieval-Augmented Generation)**:
- **Innovation**: Combines the strength of LLMs with the ability to retrieve relevant information from external sources, enhancing accuracy and contextuality.
- **Application**: Particularly useful in scenarios where factual correctness is crucial, as it leverages external knowledge databases.
- **Challenges**: Its effectiveness hinges on the quality of the retrieval system, and it can become complex and resource-intensive.
3. **GraphRAG (Graph-based Retrieval-Augmented Generation)**:
- **Extension**: Builds upon RAG by specifically targeting graph data, utilizing graph neural networks or graph databases to retrieve and integrate knowledge.
- **Advantages**: Better suited for tasks involving complex relationships and structured data, offering improved performance in graph-related applications.
- **Considerations**: Requires more sophisticated implementation and potentially higher computational resources due to the complexity of graph data.
Each approach has evolved to address specific challenges and leverage unique opportunities within the realm of NLP and knowledge management. The choice between them would depend on the nature of the task, the available data, and the desired outcome.
TERMINATE
--------------------------------------------------------------------------------
Next speaker: Code_Reviewer
由于论文是英文,所以日志的最后给出的答案是英文的内容。我们这里进行翻译,大致意思如下:
确实,附加的上下文提供了更深入的见解,展示了这些技术如何与图数据和知识库交互。以下是对三种方法的简要回顾:
大语言模型(LLMs):
- 关注点:主要设计用于处理纯文本。
- 优点:在生成类似人类的文字和执行各种自然语言处理(NLP)任务方面表现出色。
- 局限性:处理如图数据这样复杂的结构化数据时存在困难,这需要更加专业的处理方式。
检索增强生成(RAG):
- 创新点:结合了大语言模型的优势和从外部源检索相关信息的能力,提高了准确性和上下文相关性。
- 应用场景:特别适用于对事实准确性要求较高的场景,因为它利用了外部的知识数据库。
- 挑战:其有效性取决于检索系统的质量,并且可能会变得复杂和资源密集。
基于图的检索增强生成(GraphRAG):
- 扩展:在RAG的基础上发展而来,专门针对图数据,使用图神经网络或图数据库来检索和整合知识。
- 优势:更适合涉及复杂关系和结构化数据的任务,在图相关的应用中表现更佳。
- 注意事项:由于图数据的复杂性,可能需要更复杂的实现和潜在的更高计算资源。
每种方法都演变以解决特定的挑战并利用自然语言处理和知识管理领域内的独特机会。选择哪种方法将取决于任务的性质、可用的数据以及期望的结果。 TERMINATE