Bootstrap

使用 LangChain 掌握检索增强生成 (RAG) 的终极指南:3、HyDE(假设文档嵌入)在RAG中的应用

HyDE(假设文档嵌入)

HyDE不是基于原始问题生成查询,而是专注于为给定的查询生成假设性文档。生成这种假设性文档的直觉是它们的嵌入向量可以用来在语料库嵌入空间中识别一个邻域,在这个邻域中,基于向量相似性检索类似的真实文档。在这种情况下,RAG将能够基于假设性文档检索更相关的文档,以准确回答用户查询。

让我们尝试使用HyDE通过RAG回答问题!

首先,与之前的笔记本类似,我们首先创建我们的向量存储,并使用OpenAIEmbeddingsChroma初始化检索器。

# 导入必要的库
%load_ext dotenv
%dotenv secrets/secrets.env

from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# 创建DirectoryLoader实例,加载指定目录下的PDF文件
loader = DirectoryLoader('data/', glob="*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()

# 使用RecursiveCharacterTextSplitter将文本分割成块
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
text_chunks = text_splitter.split_documents(documents)

# 使用Chroma和OpenAIEmbeddings创建向量存储
vectorstore = Chroma.from_documents(
    documents=text_chunks,
    embedding=OpenAIEmbeddings(),
    persist_directory="data/vectorstore"
)
vectorstore.persist()

# 将向量存储转换为检索器
retriever = vectorstore.as_retriever(search_kwargs={'k': 5})

然后我们让LLM通过链写一个关于所问问题的“假设性”段落。

from langchain.prompts import ChatPromptTemplate

# 创建一个用于生成假设性文档的提示模板
hyde_prompt = ChatPromptTemplate.from_template(
    """
    Please write a scientific passage of a paper to answer the following question:\n
    Question: {question}\n
    Passage:
    """
)

# 创建一个生成假设性文档的链
generate_doc_chain = (
    {'question': RunnablePassthrough()}
    | hyde_prompt
    | ChatOpenAI(model='gpt-4', temperature=0)
    | StrOutputParser()
)

# 使用链生成假设性文档
question = "How Low Rank Adapters work in LLMs?"
generate_doc_chain.invoke(question)
"Low Rank Adapters (LRAs) are a recent development in the field of Large Language Models (LLMs) that aim to reduce the computational and memory requirements of these models while maintaining their performance. The fundamental principle behind LRAs is the use of low-rank approximations to reduce the dimensionality of the model's parameters.\n\nIn the context of LLMs, an adapter is a small neural network that is inserted between the layers of a pre-trained model. The purpose of this adapter is to adapt the pre-trained model to a new task without modifying the original parameters of the model. This allows for efficient transfer learning, as the computational cost of training the adapter is significantly less than retraining the entire model.\n\nLow Rank Adapters take this concept a step further by applying a low-rank approximation to the adapter's parameters. This is achieved by decomposing the weight matrix of the adapter into two smaller matrices, effectively reducing the number of parameters that need to be stored and computed. This decomposition is typically achieved using methods such as singular value decomposition (SVD) or principal component analysis (PCA).\n\nThe use of low-rank approximations in LRAs allows for a significant reduction in the computational and memory requirements of LLMs. Despite this reduction, LRAs are able to maintain a high level of performance, as the low-rank approximation captures the most important features of the data. This makes LRAs an effective tool for adapting pre-trained LLMs to new tasks in a computationally efficient manner."

使用生成的段落,我们使用我们的检索器检索相似的文档。

# 创建一个检索链,将生成文档的链和检索器连接起来
retrieval_chain = generate_doc_chain | retriever
# 检索文档
retireved_docs = retrieval_chain.invoke({"question": question})
retireved_docs
[Document(page_content='over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the\nchange in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed\nLow-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural\nnetwork indirectly by optimizing rank decomposition matrices of the dense layers’ change during\nadaptation instead, while keeping the pre-trained weights frozen, as shown in Figure 1. Using GPT-3', metadata={'page': 1, 'source': 'data/LoRA.pdf'}),
 Document(page_content='over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the\nchange in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed\nLow-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural\nnetwork indirectly by optimizing rank decomposition matrices of the dense layers’ change during\nadaptation instead, while keeping the pre-trained weights frozen, as shown in Figure 1. Using GPT-3', metadata={'page': 1, 'source': 'data/LoRA.pdf'}),
 Document(page_content='over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the\nchange in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed\nLow-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural\nnetwork indirectly by optimizing rank decomposition matrices of the dense layers’ change during\nadaptation instead, while keeping the pre-trained weights frozen, as shown in Figure 1. Using GPT-3', metadata={'page': 1, 'source': 'data/LoRA.pdf'}),
 Document(page_content='requirements by using a small set of trainable parameters, often termed adapters, while not updating\nthe full model parameters which remain fixed. Gradients during stochastic gradient descent are\npassed through the fixed pretrained model weights to the adapter, which is updated to optimize the\nloss function. LoRA augments a linear projection through an additional factorized projection. Given\na projection XW =YwithX∈Rb×h,W∈Rh×oLoRA computes:\nY=XW +sXL 1L2, (3)', metadata={'page': 2, 'source': 'data/QLoRA.pdf'}),
 Document(page_content='requirements by using a small set of trainable parameters, often termed adapters, while not updating\nthe full model parameters which remain fixed. Gradients during stochastic gradient descent are\npassed through the fixed pretrained model weights to the adapter, which is updated to optimize the\nloss function. LoRA augments a linear projection through an additional factorized projection. Given\na projection XW =YwithX∈Rb×h,W∈Rh×oLoRA computes:\nY=XW +sXL 1L2, (3)', metadata={'page': 2, 'source': 'data/QLoRA.pdf'})]

最后,基于“假设性”段落检索的文档被用作上下文,通过final_rag_chain回答我们原始的问题。

# 创建一个用于回答问题的提示模板
template = """
Answer the following question based on the provided context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# 创建最终的RAG链
final_rag_chain = (
    prompt
    | ChatOpenAI(model='gpt-4', temperature=0)
    | StrOutputParser()
)

# 使用最终的RAG链回答问题
final_rag_chain.invoke({"context": retireved_docs, "question": question})
"Low-Rank Adapters (LoRA) work in large language models (LLMs) by allowing the training of some dense layers in a neural network indirectly. This is done by optimizing rank decomposition matrices of the dense layers' change during adaptation, while keeping the pre-trained weights frozen. LoRA also augments a linear projection through an additional factorized projection. During stochastic gradient descent, gradients are passed through the fixed pre-trained model weights to the adapter, which is updated to optimize the loss function."

尽管这种技术可能有助于回答问题,但由于基于错误/虚构的假设性段落检索文档,答案有可能是错误的。

扩展知识点:

  • HyDE:Hypothetical Document Embeddings,一种通过生成假设性文档来帮助检索更相关文档的技术。
  • RAG:Retrieval-Augmented Generation,一种结合检索和生成的模型,用于提高问答系统的效果。
  • LLM:Large Language Models,大型语言模型,如GPT系列。
  • LoRA:Low-Rank Adaptation,一种通过优化低秩分解矩阵来训练神经网络密集层的方法。
;