2024年7月份,Groq 团队在huggingface上发布了基于Meta llama3两个大小(8b和70b)的开源模型进行微调(官网介绍)的模型(Groq/Llama-3-Groq-8B-Tool-Use 和 Groq/Llama-3-Groq-70B-Tool-Use),以极大提升开源LLM在函数调用和工具使用方面的表现。在发布之时,Llama-3-Groq-70B-Tool-Use 是伯克利函数调用排行榜(BFCL)上表现最好的模型,超越了所有其他开源和专有模型。
在本篇博客中,我将较为详细地介绍通过 Ollama python 库的方式对量化 Llama-3-Groq-8B-Tool-Use 模型进行调用并发挥其基于用户提问进行意图识别并选择合适函数/工具进行调用的能力,即让大语言模型具备工具选择和使用的能力。
如果要运行后续示例代码,请确保安装了 Ollama(若OS是Ubuntu 22.04,可以参考 在Ubuntu 22.04上安装Ollama的两种方式)。
一、拉取模型
ollama pull llama3-groq-tool-use
拉取完成后,可以通过 ollama list
查看该模型,8B Q4_0 量化后大小是 4.7 GB。
二、下载 Ollama python库
在指定python环境中下载 ollama python 库
pip install ollama
三、预先定义好可供LLM使用的python函数
定义两个非常基础的有关于数学运算的函数(这里我让 ChatGPT 生成了),一个是两个数的加法函数,另一个是计算数字阶乘的函数,都是数学里面比较基础的运算。
# 加法函数:计算两个数的和
def add_numbers(a, b):
return a + b
# 阶乘函数:计算给定数字的阶乘
def factorial(n):
if n == 0 or n == 1:
return 1
else:
result = 1
for i in range(2, n + 1):
result *= i
return result
四、将定义的函数用某种特定JSON格式进行描述
类似于 OpenAI 在 llm进行函数调用时定义的规范,我们按照Groq 在 huggingface model card中的prompt示例中对tool的json描述将定义好的两个python函数进行同样规范下的描述。
Text Prompt Example:
<|start_header_id|>system<|end_header_id|>
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"name": <function-name>,"arguments": <args-dict>}
</tool_call>
Here are the available tools:
<tools> {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"properties": {
"location": {
"description": "The city and state, e.g. San Francisco, CA",
"type": "string"
},
"unit": {
"enum": [
"celsius",
"fahrenheit"
],
"type": "string"
}
},
"required": [
"location"
],
"type": "object"
}
} </tools><|eot_id|><|start_header_id|>user<|end_header_id|>
What is the weather like in San Francisco?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
<tool_call>
{"id":"call_deok","name":"get_current_weather","arguments":{"location":"San Francisco","unit":"celsius"}}
</tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|>
<tool_response>
{"id":"call_deok","result":{"temperature":"72","unit":"celsius"}}
</tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|>
加法函数
通过 one-shot 让 ChatGPT 生成
{
"name": "add_numbers",
"description": "Calculate the sum of two numbers",
"parameters": {
"properties": {
"a": {
"description": "The first number to add",
"type": "number"
},
"b": {
"description": "The second number to add",
"type": "number"
}
},
"required": [
"a",
"b"
],
"type": "object"
}
}
阶乘函数
通过 one-shot 让 ChatGPT 生成
{
"name": "factorial",
"description": "Calculate the factorial of a given number",
"parameters": {
"properties": {
"n": {
"description": "The number for which the factorial is calculated",
"type": "integer"
}
},
"required": [
"n"
],
"type": "object"
}
}
五、Ollama调用模型
将上一步骤准备的json格式的函数描述放入输入提示词的…标签中。
from ollama import Client
ollama_client = Client(host="http://localhost:11434")
model_name = "llama3-groq-tool-use"
user_query = "请问数字12的阶乘是多少?"
prompt = """
<tools>
{
"name": "add_numbers",
"description": "Calculate the sum of two numbers",
"parameters": {
"properties": {
"a": {
"description": "The first number to add",
"type": "number"
},
"b": {
"description": "The second number to add",
"type": "number"
}
},
"required": [
"a",
"b"
],
"type": "object"
}
}
{
"name": "factorial",
"description": "Calculate the factorial of a given number",
"parameters": {
"properties": {
"n": {
"description": "The number for which the factorial is calculated",
"type": "integer"
}
},
"required": [
"n"
],
"type": "object"
}
}
</tools>
""" + "\n" + user_query
进行模型推理(zero-shot)
response = ollama_client.chat(model=model_name, messages=[
{
'role': 'user',
'content': prompt,
}], options = {"temperature": 0.1})
将推理结果打印出来
output = response['message']['content']
print(output)
打印出来的output内容如下,可以注意到其实生成的并不完全符合我们预期。首先</tool_call>单边标签缺失,而且llm还对该问题进行了回答。
<tool_call>
{
"name": "factorial",
"parameters": {
"n": 12
}
}
<p>12! = 479001600</p>
六、调整推理方式为one-shot
为了让模型生成符合我们期望的<tool_call>…</tool_call>内容,我们对 ollama_client.chat()
函数中的 messages
列表参数做稍许调整,通过提供一个输出示例的方式(one-shot)来进行模型推理。
user_query_example = """
<tools>
{
"name": "add_numbers",
"description": "Calculate the sum of two numbers",
"parameters": {
"properties": {
"a": {
"description": "The first number to add",
"type": "number"
},
"b": {
"description": "The second number to add",
"type": "number"
}
},
"required": [
"a",
"b"
],
"type": "object"
}
}
{
"name": "factorial",
"description": "Calculate the factorial of a given number",
"parameters": {
"properties": {
"n": {
"description": "The number for which the factorial is calculated",
"type": "integer"
}
},
"required": [
"n"
],
"type": "object"
}
}
</tools>
What is the result of 3.121321 plus 9.7832198391321?
"""
response_example = """<tool_call>
{"name":"add_numbers","parameters":{"a":3.121321, "b": 9.7832198391321}}
</tool_call>"""
user_query = "请问23和78的和是多少?"
response = ollama_client.chat(model=model_name, messages=[
{
'role': 'user',
'content': user_query_example,
},
{
'role': 'assistant',
'content': response_example
},
{
'role': 'user',
'content': user_query
}], options = {"temperature": 0.1})
推理完成后打印出模型生成的内容
output = response['message']['content']
print(output)
可以看到通过提供一轮问答输入输出的one-shot方式,就可以得到完全符合我们预期的输出结果。
<tool_call>
{"name":"add_numbers","parameters":{"a":23, "b":78}}
</tool_call>
七、调用被选择的函数/工具
这里主要包含两个具体步骤,第一是将<tool_call>
标签内的JSON内容提取出来;第二是通过提取 JSON 数据中的 name
字段来判断应该调用哪个函数,并将 parameters
字段中的参数传递给对应的函数。
实现代码如下
import json
import re
json_str = re.search(r'<tool_call>\n(.*?)\n</tool_call>', output).group(1)
# 解析 JSON 字符串为字典
data = json.loads(json_str)
function_name = data['name'] # 'add_numbers'
parameters = data['parameters'] # {'a': 23, 'b': 78}
# 动态调用函数
if function_name in globals():
result = globals()[function_name](**parameters)
result
变量存储的即为调用函数之后得到的返回结果。
八、将调用函数/工具得到的结果传入模型,进行第二次推理
如果用户提问只涉及到数学运算,我们确实可以直接把上一个步骤得到的 result
值作为回答提供给用户,但往往我们还是更希望大语言模型将调用函数/工具得到的结果作为知识来源,以一种更为自然友好的对话方式将结果提供给用户。这样的话我们就需要进行二次推理,以生成最终用于回复用户的内容。
response_2 = ollama_client.chat(model=model_name, messages=[
{
'role': 'user',
'content': user_query
},
# role为tool的content(调用函数/工具拿到的返回值)提供要放在user_query之后
{
'role': 'tool',
'content': call_tool_result
}], options = {"temperature": 0.1})
final_answer = response_2['message']['content']
打印出最终回复
print(final_answer)
The sum of 23 and 79.898 is 102.898.
问答示例
示例一
user_query = "请问数字12的阶乘是多少?"
print(final_answer)
The factorial of 12 is 479001600.
示例二
user_query = "请问89.893289103加上231.321321等于多少?"
print(final_answer)
The sum of 89.893289103 and 231.321321 is approximately 321.214610103.
完整代码
from ollama import Client
import json
import re
ollama_client = Client(host="http://localhost:11434")
model_name = "llama3-groq-tool-use"
# 加法函数:计算两个数的和
def add_numbers(a, b):
return a + b
# 阶乘函数:计算给定数字的阶乘
def factorial(n):
if n == 0 or n == 1:
return 1
else:
result = 1
for i in range(2, n + 1):
result *= i
return result
def call_tool(tool_call_str):
json_str = re.search(r'<tool_call>\n(.*?)\n</tool_call>', output).group(1)
# 解析 JSON 字符串为字典
data = json.loads(json_str)
function_name = data['name'] # 'add_numbers'
parameters = data['parameters'] # {'a': 23, 'b': 78}
# 动态调用函数
if function_name in globals():
result = globals()[function_name](**parameters)
return str(result)
user_query_example = """
<tools>
{
"name": "add_numbers",
"description": "Calculate the sum of two numbers",
"parameters": {
"properties": {
"a": {
"description": "The first number to add",
"type": "number"
},
"b": {
"description": "The second number to add",
"type": "number"
}
},
"required": [
"a",
"b"
],
"type": "object"
}
}
{
"name": "factorial",
"description": "Calculate the factorial of a given number",
"parameters": {
"properties": {
"n": {
"description": "The number for which the factorial is calculated",
"type": "integer"
}
},
"required": [
"n"
],
"type": "object"
}
}
</tools>
What is the result of 3.121321 plus 9.7832198391321?
"""
response_example = """<tool_call>
{"name":"add_numbers","parameters":{"a":3.121321, "b": 9.7832198391321}}
</tool_call>"""
user_query = "请问数字12的阶乘是多少?"
response = ollama_client.chat(model=model_name, messages=[
{
'role': 'user',
'content': user_query_example,
},
{
'role': 'assistant',
'content': response_example
},
{
'role': 'user',
'content': user_query
}], options = {"temperature": 0.1})
output = response['message']['content']
call_tool_result = call_tool(output)
response_2 = ollama_client.chat(model=model_name, messages=[
{
'role': 'user',
'content': user_query
}, {
'role': 'tool',
'content': call_tool_result
}], options = {"temperature": 0.1})
final_answer = response_2['message']['content']
print(final_answer)
最后
本篇博客的重点放在了让LLM选择、调用函数/工具并基于返回值进行回答的流程搭建上,所以选择了两个数学运算的函数作为例子(实际上很多较新的大语言模型已经具备了内部调用相关函数进行准确计算的能力),所以函数的选择在实际的使用场景中并不具备太多参考价值,大家可以参考搭建流程根据具体需要去尝试更多有趣实用的工具,让LLM的能力更进一步!