Deepseek 本地部署

环境准备

系统要求

+ ** 操作系统** : Linux (推荐Ubuntu 20.04+) / Windows (需WSL2) + ** Python** : 3.8+ + ** GPU** : NVIDIA GPU (显存≥16GB，推荐RTX 3090/A100) + CUDA 11.8 + ** 硬盘空间** : ≥50GB（模型权重和依赖）

安装依赖

```shell # 创建虚拟环境 conda create -n deepseek python=3.10 -y conda activate deepseek

安装PyTorch与CUDA支持

pip install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118

安装Hugging Face库

pip install transformers==4.35.0 accelerate sentencepiece


<h1 id="k62MG"><font style="color:rgb(64, 64, 64);">获取模型权重</font></h1>
<h2 id="uADAD"><font style="color:rgb(64, 64, 64);">官方渠道下载</font></h2>
1. <font style="color:rgb(64, 64, 64);">访问</font><font style="color:rgb(64, 64, 64);"> </font>[DeepSeek官方开源页面](https://github.com/deepseek-ai)<font style="color:rgb(64, 64, 64);"> </font><font style="color:rgb(64, 64, 64);">或</font><font style="color:rgb(64, 64, 64);"> </font>[Hugging Face Model Hub](https://huggingface.co/deepseek-ai)
2. <font style="color:rgb(64, 64, 64);">找到目标模型（如</font><font style="color:rgb(64, 64, 64);"> </font>`<font style="color:rgb(64, 64, 64);">deepseek-llm-7b-base</font>`<font style="color:rgb(64, 64, 64);">）</font>
3. <font style="color:rgb(64, 64, 64);">通过Git下载：</font>

```shell
git lfs install
git clone https://huggingface.co/deepseek-ai/deepseek-llm-7b-base

备用方式（国内镜像）

若官方源下载慢，可使用国内镜像站：

from huggingface_hub import snapshot_download
snapshot_download(
    "deepseek-ai/deepseek-llm-7b-base",
    local_dir="./deepseek-model",
    revision="main",
    mirror="https://mirror.sjtu.edu.cn/huggingface"  # 上海交大镜像
)

模型加载与推理

基础推理代码

```python from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = “./deepseek-llm-7b-base”
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map=“auto”, # 自动分配GPU/CPU
torch_dtype=torch.bfloat16
)

input_text = “中国的首都是哪里？”
inputs = tokenizer(input_text, return_tensors=“pt”).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


<h2 id="gbtm8"><font style="color:rgb(64, 64, 64);">启用量化推理（降低显存需求）</font></h2>
```python
from transformers import BitsAndBytesConfig

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=quant_config,
    device_map="auto"
)

部署为API服务

使用FastAPI创建REST接口

```python from fastapi import FastAPI from pydantic import BaseModel

app = FastAPI()

class QueryRequest(BaseModel):
text: str
max_length: int = 100

@app.post(“/generate”)
async def generate_text(request: QueryRequest):
inputs = tokenizer(request.text, return_tensors=“pt”).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=request.max_length,
temperature=0.7
)
return {“result”: tokenizer.decode(outputs[0], skip_special_tokens=True)}


<h2 id="ALOO7"><font style="color:rgb(64, 64, 64);">启动服务</font></h2>
```shell
uvicorn api:app --host 0.0.0.0 --port 8000

高级配置

多GPU并行

```python model = AutoModelForCausalLM.from_pretrained( model_path, device_map="balanced", # 均匀分配各层到GPU ) ```

监控显存使用

```bash # 安装监控工具 pip install nvitop

实时查看显存占用

nvitop -m full


<h2 id="yG7HU"><font style="color:rgb(64, 64, 64);">安全访问控制</font></h2>
<font style="color:rgb(64, 64, 64);">在FastAPI中添加身份验证：</font>

```python
from fastapi.security import APIKeyHeader

security = APIKeyHeader(name="X-API-Key")

@app.post("/generate")
async def secure_generate(
    request: QueryRequest,
    api_key: str = Depends(security)
):
    if api_key != "YOUR_SECRET_KEY":
        raise HTTPException(status_code=403, detail="Invalid API Key")
    # ...原有生成逻辑...

常见问题排查

| **问题现象** | **解决方案** | | --- | --- | | ` CUDA out of memory` | 启用量化（4bit/8bit）或使用更大显存GPU | | 中文输出乱码 | 检查tokenizer是否加载正确，强制使用UTF-8编码 | | 推理速度慢 | 启用` flash_attention`
优化或使用TensorRT加速 | | 模型权重加载失败 | 验证文件完整性（对比SHA256校验码） |

备注

+ 完整部署指南建议参考 [DeepSeek官方文档](https://github.com/deepseek-ai/DeepSeek-LLM)