报错前置条件
使用vllm启动qwen2.5-32b-instruct模型后发生的报错
GPU是GeForce RTX 4090 Laptop GPU
系统是Windows 11
运行系统是WSL2-Ubuntu22.04
报错内容
INFO 10-22 22:29:31 engine.py:290] Added request chat-993cbe95e73d4a1db5d1e89e433f727a.
ERROR 10-22 22:29:32 client.py:250] RuntimeError('Engine loop has died')
ERROR 10-22 22:29:32 client.py:250] Traceback (most recent call last):
ERROR 10-22 22:29:32 client.py:250] File "/home/ai/miniconda3/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop
ERROR 10-22 22:29:32 client.py:250] await self._check_success(
ERROR 10-22 22:29:32 client.py:250] File "/home/ai/miniconda3/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 314, in _check_success
ERROR 10-22 22:29:32 client.py:250] raise response
ERROR 10-22 22:29:32 client.py:250] RuntimeError: Engine loop has died
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 259, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 255, in wrap
await func()
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 232, in listen_for_disconnect
message = await receive()
File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/home/ai/miniconda3/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f385017b9d0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
await response(scope, receive, send)
File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 252, in __call__
async with anyio.create_task_group() as task_group:
File "/home/ai/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 763, in __aexit__
raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
解决方案
判断是内存不足导致
d$ free -h
total used free shared buff/cache available
Mem: 15Gi 6.9Gi 8.2Gi 80Mi 435Mi 8.2Gi
Swap: 4.0Gi 4.0Gi 0.0Ki
从输出可以看到,系统总内存为 15GB,目前使用了约 6.9GB,剩余约 8.2GB 可用
交换空间(Swap)总共为 4GB,目前已全部使用,且没有可用的交换空间。
如果交换空间不足,会严重影响系统性能
要将交换空间设置为与你的物理内存相同的大小(15GB),可以按照以下步骤操作:
-
创建一个新的交换文件:
sudo fallocate -l 15G /swapfile
-
设置正确的权限:
sudo chmod 600 /swapfile
-
将文件设置为交换空间:
sudo mkswap /swapfile
-
启用交换文件:
sudo swapon /swapfile
-
确认交换空间已启用:
free -h
-
要使更改永久生效,请编辑
/etc/fstab
文件,添加以下行:sudo vim /etc/fstab /swapfile swap swap defaults 0 0 :wq
这样,就能将交换空间设置为 15GB,性能完全发挥
如果/etc/fstab
编辑后不起作用,可以将前面5个步骤的命令写入~/.bashrc
中