如何在Windows下离线部署DeepSeek并以WebApi形式调用

最近这一块的话题有点火，这里也找资料学习了解了一下，分享出来。

目前应该有很多公司已经通过官方的WebApi接口接入了DeepSeek。

本文可以帮助你离线使用DeepSeek的WebApi，并集成到自己的程序里。

文末会有示例程序。

Ollama是什么

Ollama 是一个开源的 AI 模型服务平台，旨在提供高效且易于使用的 AI 模型部署和管理解决方案。它结合了强大的计算能力和灵活的模型接口，使用户能够轻松构建、训练和部署自定义 AI 模型。

目前支持在本地运行包括但不限于 Llama 3.3, DeepSeek-R1, Phi-4, Mistral, Gemma 2, 等模型。

项目地址：https://github.com/ollama/ollama

安装Ollama

打开Ollama下载页：Download Ollama on macOS

下载Windows版本

下载完成后执行安装

目前还无法选择路径，会自动安装到C盘。

不过我看了一下这种非为Windows单独设计的软件，一般都没什么依赖。手动移下安装文件位置，再更新下环境变量，应该问题不大。

下载DeepSeek离线模型

打开Ollama下的DeepSeek模型下载页：deepseek-r1

选择适合自己电脑配置的离线模型，这里建议装1.5b版本就可以了

可以简单参考一下下面的配置要求

‌1.5B‌：

‌CPU‌：最低4核，推荐Intel/AMD多核处理器。

‌内存‌：8GB+。

‌硬盘‌：3GB+，模型文件约1.5-2GB。

‌显卡‌：非必需，纯CPU推理即可，若GPU加速可选4GB显存，如GTX 1650‌12。

‌7B‌：

‌CPU‌：8核以上，推荐现代多核CPU。

‌内存‌：16GB+。

‌硬盘‌：8GB+，模型文件约4-5GB。

‌显卡‌：推荐8GB显存，如RTX 3070/4060‌12。

确认安装的版本后，复制右边的脚本到cmd中运行

执行后可以看到正在下载模型

这里应该是从国内服务器下载，所以下载速度就取决于你的带宽了

运行DeepSeek离线模型

下载完成后，DeepSeek离线模型会开始运行

这个时候我们可以进行对话，因为是离线模型，所以并不具备联网查询的能力

如果我们需要手动运行DeepSeek离线模型，可以通过打开cmd，输入以下脚本即可

1 ollama run deepseek-r1:1.5b

如何通过WebApi调用

Ollama在安装成功后，提供了WebApi的调用形式。

请求地址是：http://localhost:11434/api/generate

请求/返回的数据格式是json，定义如下：

Request

{
  "model": "deepseek-r1:1.5b",
  "prompt": "Why is the sky blue?"
}'

model：模型名，这里我们使用的是deepseek-r1:1.5b

prompt：对话内容

Response

{
  "model": "deepseek-r1:1.5b",
  "created_at": "2024-08-01T08:52:19.385406455-07:00",
  "response": "The",
  "done": false
}

model：模型名

created_at：创建日期

response：响应（也就是我们要读取的对话内容）

done：对话是否完成

注意：WebApi调用默认是以流式的形式进行访问，所以对话结果并不是一次就生成的，而是会返回多次结果。

我们可以通过Apifox类似的api测试软件进行测试

测试通过后，我们可以到代码里进行集成。

通过C#调用

主要用到的是HttpClient类，因为这里的请求是Post，所以一开始我使用了HttpClient.PostAsync()，结果发现并不能实时获取对话输出内容，而是要等对话完才行。

后面查阅了OllamaSharp里面的代码，发现是需要调用SendAsync函数，然后再对返回的流进行处理，核心代码如下：

 1 protected virtual async Task<HttpResponseMessage> SendToOllamaAsync(HttpRequestMessage requestMessage, OllamaRequest? ollamaRequest, HttpCompletionOption completionOption, CancellationToken cancellationToken)
 2 {
 3     requestMessage.ApplyCustomHeaders(DefaultRequestHeaders, ollamaRequest);
 4 
 5     var response = await _client.SendAsync(requestMessage, completionOption, cancellationToken).ConfigureAwait(false);
 6 
 7     await EnsureSuccessStatusCodeAsync(response).ConfigureAwait(false);
 8 
 9     return response;
10 }

对流进行读取

 1 private async IAsyncEnumerable<GenerateResponseStream?> ProcessStreamedCompletionResponseAsync(HttpResponseMessage response, [EnumeratorCancellation] CancellationToken cancellationToken)
 2 {
 3     using var stream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
 4     using var reader = new StreamReader(stream);
 5 
 6     while (!reader.EndOfStream && !cancellationToken.IsCancellationRequested)
 7     {
 8         var line = await reader.ReadLineAsync().ConfigureAwait(false) ?? "";
 9         var streamedResponse = JsonSerializer.Deserialize<GenerateResponseStream>(line, IncomingJsonSerializerOptions);
10 
11         yield return streamedResponse?.Done ?? false
12             ? JsonSerializer.Deserialize<GenerateDoneResponseStream>(line, IncomingJsonSerializerOptions)!
13             : streamedResponse;
14     }
15 }

所以这里我就直接使用OllamaSharp包来进行演示了

OllamaSharp包使用方法如下：

1、nuget安装OllamaSharp包

2、示例代码如下

 1 var uri = new Uri("http://localhost:11434");   //指定请求路径
 2 var ollama = new OllamaApiClient(uri);
 3 ollama.SelectedModel = "deepseek-r1:1.5b";  //指定模型
 4 
 5 var requestText ="今天天气怎么样";  //对话问题
 6 
 7 await foreach (var stream in ollama.GenerateAsync(requestText))
 8 {
 9     var response = stream.Response; //响应
10 
11 }

在WPF中演示

首先我们创建一个界面，在界面上放置一个RichTextBox用于输出结果，一个TextBox用于输入问题，一个Button用于发送。

XAML

 1   <Grid>
 2       <Grid.RowDefinitions>
 3           <RowDefinition/>
 4           <RowDefinition Height="40"/>
 5       </Grid.RowDefinitions>
 6 
 7       <RichTextBox x:Name="tbox_ai" ScrollViewer.VerticalScrollBarVisibility="Auto"></RichTextBox>
 8 
 9       <Grid Grid.Row="1">
10           <TextBox Height="35" VerticalContentAlignment="Center" Margin="10,0,100,0" Name="tbox_request"></TextBox>
11           <Button HorizontalAlignment="Right" Content="发送" Width="88" Height="28" Margin="0,0,5,0" Click="Button_Click" IsDefault="True"></Button>
12       </Grid>
13   </Grid>

.cs

 1   private async void Button_Click(object sender, RoutedEventArgs e)
 2   {
 3       var uri = new Uri("http://localhost:11434");
 4       var ollama = new OllamaApiClient(uri);
 5       ollama.SelectedModel = "deepseek-r1:1.5b";
 6 
 7       var requestText = this.tbox_request.Text;
 8       this.tbox_request.Clear();
 9 
10       //注意：新版本的函数名是ollama.GenerateAsync
11       await foreach (var stream in ollama.Generate(requestText))
12       {
13           DisplayResponse(stream.Response);
14 
15           await Task.Delay(100);
16       }
17 
18   }
19 
20   private void DisplayResponse(string ollamaResponse)
21   {
22       Application.Current.Dispatcher.Invoke(() => {
23           Paragraph paragraph = null;
24 
25           if (this.tbox_ai.Document.Blocks.Count == 0)
26           {
27               paragraph = new Paragraph();
28               this.tbox_ai.Document.Blocks.Add(paragraph);
29           }
30           else
31           {
32               paragraph = this.tbox_ai.Document.Blocks.ElementAt(0) as Paragraph;
33           }
34 
35           Run run = new Run(ollamaResponse);
36           paragraph.Inlines.Add(run);
37       });
38   }

运行效果

示例代码

下载

参考资料

https://github.com/ollama/ollama/blob/main/docs/api.md