一、所需要软件
本文操作均在Win10系统上完成,需要用到的软件与依赖包有:cuda 10.2 , cudnn 8.6.5 , VS2017 , OpenCV 4.0.0 , Anaconda3 , CMake 3.10.1 , TensorRT 8,pytorch
安装之前需要做如下:
1、显卡版本
通过英伟达官网
来确认显卡驱动与CUDA版本对应关系,显卡驱动建议安装越新越好,基本上所有的版本cuda都能支持
2.显卡算力查询
nvdia官方查询地址:https://developer.nvidia.com/cuda-gpushttps://developer.nvidia.com/cuda-gpus
3. cuda版本选择
cuda版本选择,主要考虑如下三个因素:
- 是否稳定支持
- 考虑Visual Studio软件版本
- 考虑pytorch的版本
4、CUDA版本与Visual Studio软件版本关系
- CUDA 12.x系列:虽然CUDA 12.x与Visual Studio 2019和2022兼容,但建议尽量使用最新版本的Visual Studio和CUDA Toolkit,以获得最佳的开发体验和性能支持。同时,也需要注意操作系统版本的要求,确保您的操作系统能够支持所安装的CUDA和Visual Studio版本。
- CUDA 11.x系列:最好与Visual Studio 2019较新版本配合使用,包括VS 2019 version 16.4、16.5、16.7和16.8等。
- CUDA 11.0:与VS 2017 Update 5兼容。
- CUDA 10.2:与VS 2017 Update 3兼容。
- CUDA 10.1:可以与Visual Studio 2019集成,但需要注意安装过程中勾选与VS相关的组件,如CUDA Visual Studio Integration,并确保安装的是VS 2019的支持CUDA的版本,如Enterprise、Professional或Community版。
5、 CUDA版本与Pytorch版本关系
通过pytorch官网https://pytorch.org/https://pytorch.org/来确认
6、常见的PyTorch与Python版本对应关系
以下是一些常见的PyTorch版本及其对应的Python版本:
PyTorch版本 | 支持的Python版本 |
---|---|
1.0.x 及更早版本 | Python 2.7, 3.5(注意:Python 2.7已在后续版本中不再支持) |
1.1.x | Python 3.6及以上版本 |
1.2.x | Python 3.6及以上版本 |
1.3.x | Python 3.6及以上版本 |
1.4.x | Python 3.5至3.8(但推荐3.6及以上以获得更好的兼容性) |
1.5.x | Python 3.5至3.8 |
1.6.x | Python 3.5至3.8 |
1.7.x | Python 3.6至3.9 |
1.8.x | Python 3.6至3.9 |
1.9.x | Python 3.6至3.9 |
1.10.x及更高版本 | 通常支持最新的几个Python版本(如3.6至3.9,具体取决于发布时的最新Python版本) |
也可以通过pytorch官网查询
7. cudnn版本和tensorrt版本
选择cudnn版本时要优先考虑tensorrt版本
1.1 显卡驱动安装
查看自己电脑显卡型号
NVIDIA 显卡驱动官网链接:https://www.nvidia.cn/Download/index.aspx?lang=cnhttps://www.nvidia.cn/Download/index.aspx?lang=cn , 搜索就可以下载电脑相对应的英伟达显卡驱动
安装完之后,用
nvidia-smi
检测是否安装成功
1.2 CUDA安装
通过nvidia官网下载cuda安装包https://developer.nvidia.com/cuda-downloadshttps://developer.nvidia.com/cuda-downloads
通过下载历史版本
下载后得到文件:cuda_10.2.89_441.22_win10.exe,点击安装即可,此处使用默认路径以方便后面配置路径
同时显卡驱动一栏可以不勾选,因为之前已经安装过了
安装完成后设置环境变量
右键点击此电脑,打开属性—>高级系统设置—>环境变量,可以看到系统变量中多了CUDA_PATH和CUDA_PATH_V10_2两个环境变量(CUDA默认安装位置路径为:C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2)
接下来,还需要在系统变量中添加以下五个变量:
CUDA_SDK_PATH = C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2
CUDA_LIB_PATH = %CUDA_PATH%\lib\x64
CUDA_BIN_PATH = %CUDA_PATH%\bin
CUDA_SDK_BIN_PATH = %CUDA_SDK_PATH%\bin\win64
CUDA_SDK_LIB_PATH = %CUDA_SDK_PATH%\common\lib\x64
在系统变量中双击打开Path变量, 在其末尾添加如下指令路径:
%CUDA_LIB_PATH%;%CUDA_BIN_PATH%;%CUDA_SDK_LIB_PATH%;%CUDA_SDK_BIN_PATH%;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\extras\CUPTI\lib64
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2\bin\win64
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2\common\lib\x64
安装测试
最后测试CUDA是否配置成功,打开CMD执行:
nvcc -V
即可看到如下图所示CUDA的信息,则配置成功
1.3 CUDNN安装
CUDNN官方链接:https://developer.nvidia.com/cudnnhttps://developer.nvidia.com/cudnn最新版本下载地址:https://developer.nvidia.com/cudnn-downloadshttps://developer.nvidia.com/cudnn-downloads
历史版本下载地址:https://developer.nvidia.com/cudnn-archivehttps://developer.nvidia.com/cudnn-archive
下载后得到文件:cudnn-10.2-windows10-x64-v8.6.0.163.zip
将压缩包文件解压打开,然后将cuda目录下的bin,include,lib中的文件分别复制粘贴到路径C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2文件夹下相对应的目录
1.4 安装OpenCV
OpenCV官网:
http:// https://opencv.orghttp:// https://opencv.org
下载完成后,双击opencv-4.0.0-vc14_vc15.exe运行进行解压,将压缩包解压到相应目录(自定义路径),如:D:\Program Files (x86)\opencv,然后在系统变量Path的末尾添加:D:\Program Files (x86)\opencv\build\x64\vc15\bin,即完成安装
1.5 安装Anaconda3
官方下载地址
当前版本 https://www.anaconda.com/download/https://link.zhihu.com/?target=https%3A//www.anaconda.com/download/历史版本 https://repo.anaconda.com/archive/https://link.zhihu.com/?target=https%3A//repo.anaconda.com/archive/
镜像下载地址
详见《Anaconda介绍及发行版本说明https://blog.csdn.net/a8039974/article/details/142677775?spm=1001.2014.3001.5501》
安装python3.8版本对应的anaconda
1.6 安装pytorch
官方下载地址
当前版本:
PyTorchhttps://pytorch.org/历史版本:
https://pytorch.org/get-started/previous-versions/https://pytorch.org/get-started/previous-versions/当前选择版本pytorch1.11,采用如下命令安装
# CUDA 10.2
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=10.2 -c pytorch
# CUDA 10.2
pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu102/torch_stable.html
1.7 安装CMake
官网下载地址:Download CMakehttps://cmake.org/download/
CMake版本及使用详见《深入浅出之CMake工具及CMakefile文件https://blog.csdn.net/a8039974/article/details/142820552?spm=1001.2014.3001.5501》
1.8 安装Visual Studio
官网下载
当前版本:
目前选择安装vs2017
1.9 TensorRT安装
官网下载
https://developer.nvidia.com/tensorrt/downloadhttps://developer.nvidia.com/tensorrt/download选择Tensorrt 8
环境配置
将压缩文件解压得到TensorRT-8.5.1.7的文件夹,将里边lib文件夹的绝对路径添加到环境变量中,即D:\TensorRT-8.5.1.7\lib
使用python接口的tensorrt时,需要安装pycuda包
二、YOLO项目下载及安装
环境搭建详见《YOLO环境搭建https://blog.csdn.net/a8039974/article/details/142678258?spm=1001.2014.3001.5501》
YOLO项目框架详见《深入浅出之Ultralytics框架https://blog.csdn.net/a8039974/article/details/142765290?spm=1001.2014.3001.5501
》
三、TensorRT加速部署
(1) tensorrtx下载
对于tensorrtx,Github链接:https://github.com/wang-xinyu/tensorrtxhttps://github.com/wang-xinyu/tensorrtx下载完成后,将压缩包解压
(2) dirent.h下载
Dirent 是一个 C/C++ 编程接口,允许程序员在 Linux/UNIX 下检索有关文件和目录的信息。 该项目为 Microsoft Windows 提供了 Linux 兼容的 Dirent 接口,Github链接:https://github.com/tronkko/direnthttps://github.com/tronkko/dirent
下载后将dirent.h放到tensorrt下include里
(3) CMakeList.txt修改⭐
官方给出的CMakeList是linux版本的,想在Windows系统下运行项目需要修改CMakeList文件。具体修改完成后的内容如下:
复制粘贴即可用,需根据自己路径修改
cmake_minimum_required(VERSION 3.10)
project(yolov8 LANGUAGES CXX CUDA)
add_definitions(-std=c++11)
add_definitions(-DAPI_EXPORTS)
option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)
# setup CUDA
# if(POLICY CMP0146)
# cmake_policy(SET CMP0146 OLD)
# endif()
find_package(CUDA REQUIRED)
message(STATUS " libraries: ${CUDA_LIBRARIES}")
message(STATUS " include path: ${CUDA_INCLUDE_DIRS}")
if(CUDA_FOUND)
list(APPEND CUDA_NVCC_FLAGS "-std=c++11")
endif(CUDA_FOUND)
include_directories(${CUDA_INCLUDE_DIRS})
####
enable_language(CUDA) # add this line, then no need to setup cuda path in vs
####
#include_directories(${PROJECT_SOURCE_DIR}\\include)
#include_directories(${TRT_DIR}\\include)
# cuda
include_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/include)
link_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/lib64)
#tensorrt
include_directories(F:/MrAIPlatform/tensorrt/TensorRT-8.5.1.7/include)
link_directories(F:/MrAIPlatform/tensorrt/TensorRT-8.5.1.7/lib)
set(CMAKE_PREFIX_PATH F:/MrAIPlatform/depends/opencv4.0)
#find_package(OpenCV REQUIRED)
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories(${PROJECT_SOURCE_DIR}/include)
include_directories(${PROJECT_SOURCE_DIR}/plugin)
# MESSAGE(STATUS "operation system is ${CMAKE_SYSTEM}")
# IF (CMAKE_SYSTEM_NAME MATCHES "Linux")
# MESSAGE(STATUS "current platform: Linux ")
# set(CUDA_COMPILER_PATH "/usr/local/cuda/bin/nvcc")
# set(TENSORRT_PATH "/home/benol/Package/TensorRT-8.6.1.6")
# include_directories(/usr/local/cuda/include)
# link_directories(/usr/local/cuda/lib64)
# link_directories(/usr/local/cuda/lib)
# ELSEIF (CMAKE_SYSTEM_NAME MATCHES "Windows")
# MESSAGE(STATUS "current platform: Windows")
# set(CUDA_COMPILER_PATH "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/bin/nvcc.exe")
# set(TENSORRT_PATH "F:\\MrAIPlatform\\tensorrt\\TensorRT-8.5.1.7")
# set(OpenCV_DIR "F:\\MrAIPlatform\\depends\\opencv4.0")
# include_directories(${PROJECT_SOURCE_DIR}/windows)
# find_package(CUDA REQUIRED)
# # cuda
# include_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/include)
# link_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/lib64)
# ELSE (CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
# MESSAGE(STATUS "other platform: ${CMAKE_SYSTEM_PROCESSOR}")
# include_directories(/usr/local/cuda/targets/aarch64-linux/include)
# link_directories(/usr/local/cuda/targets/aarch64-linux/lib)
# ENDIF (CMAKE_SYSTEM_NAME MATCHES "Linux")
# include and link dirs of cuda and tensorrt, you need adapt them if yours are different
# if (CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
# message("embed_platform on")
# include_directories(/usr/local/cuda/targets/aarch64-linux/include)
# link_directories(/usr/local/cuda/targets/aarch64-linux/lib)
# else()
# message("embed_platform off")
# # cuda
# include_directories(/usr/local/cuda/include)
# link_directories(/usr/local/cuda/lib64)
# # tensorrt
# include_directories(/home/lindsay/TensorRT-8.4.1.5/include)
# link_directories(/home/lindsay/TensorRT-8.4.1.5/lib)
# # include_directories(/home/lindsay/TensorRT-7.2.3.4/include)
# # link_directories(/home/lindsay/TensorRT-7.2.3.4/lib)
# endif()
# tensorrt
include_directories(${TENSORRT_PATH}/include)
link_directories(${TENSORRT_PATH}/lib)
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})
add_library(myplugins SHARED ${PROJECT_SOURCE_DIR}/plugin/yololayer.cu)
target_link_libraries(myplugins nvinfer cudart)
file(GLOB_RECURSE SRCS ${PROJECT_SOURCE_DIR}/src/*.cpp ${PROJECT_SOURCE_DIR}/src/*.cu)
add_executable(yolov8_det ${PROJECT_SOURCE_DIR}/yolov8_det.cpp ${SRCS})
target_link_libraries(yolov8_det nvinfer)
target_link_libraries(yolov8_det cudart)
target_link_libraries(yolov8_det myplugins)
target_link_libraries(yolov8_det ${OpenCV_LIBS})
add_executable(yolov8_seg ${PROJECT_SOURCE_DIR}/yolov8_seg.cpp ${SRCS})
target_link_libraries(yolov8_seg nvinfer cudart myplugins ${OpenCV_LIBS})
add_executable(yolov8_pose ${PROJECT_SOURCE_DIR}/yolov8_pose.cpp ${SRCS})
target_link_libraries(yolov8_pose nvinfer cudart myplugins ${OpenCV_LIBS})
add_executable(yolov8_cls ${PROJECT_SOURCE_DIR}/yolov8_cls.cpp ${SRCS})
target_link_libraries(yolov8_cls nvinfer cudart myplugins ${OpenCV_LIBS})
add_executable(yolov8_5u_det ${PROJECT_SOURCE_DIR}/yolov8_5u_det.cpp ${SRCS})
target_link_libraries(yolov8_5u_det nvinfer cudart myplugins ${OpenCV_LIBS})
(4) tensorrtx编译运行
新建build文件夹,随后打开cmake-gui软件
确定好源代码路径和生成路径—>点击Configure并设置环境—>点击Finish,等待Configure done—>点击Generate并等待Generate done—>点击Open Project
打开项目后,使用Release x64平台生成解决方案,如下图所示,即生成成功
(6) tensorrtx加速命令使用
- generate .wts from pytorch with .pt, or download .wts from model zoo
git clone -b v7.0 https://github.com/ultralytics/yolov5.git
git clone -b yolov5-v7.0 https://github.com/wang-xinyu/tensorrtx.git
cd yolov5/
wget https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt
cp [PATH-TO-TENSORRTX]/yolov5/gen_wts.py .
python gen_wts.py -w yolov5s.pt -o yolov5s.wts
# A file 'yolov5s.wts' will be generated.
- build tensorrtx/yolov5 and run
cd [PATH-TO-TENSORRTX]/yolov5/
# Update kNumClass in src/config.h if your model is trained on custom dataset
mkdir build
cd build
cp [PATH-TO-ultralytics-yolov5]/yolov5s.wts .
cmake ..
make
./yolov5_det -s [.wts] [.engine] [n/s/m/l/x/n6/s6/m6/l6/x6 or c/c6 gd gw] // serialize model to plan file
./yolov5_det -d [.engine] [image folder] // deserialize and run inference, the images in [image folder] will be processed.
# For example yolov5s
./yolov5_det -s yolov5s.wts yolov5s.engine s
./yolov5_det -d yolov5s.engine ../images
# For example Custom model with depth_multiple=0.17, width_multiple=0.25 in yolov5.yaml
./yolov5_det -s yolov5_custom.wts yolov5.engine c 0.17 0.25
./yolov5_det -d yolov5.engine ../images
-
Check the images generated, _zidane.jpg and _bus.jpg
-
Optional, load and run the tensorrt model in Python
// Install python-tensorrt, pycuda, etc.
// Ensure the yolov5s.engine and libmyplugins.so have been built
python yolov5_det_trt.py
// Another version of python script, which is using CUDA Python instead of pycuda.
python yolov5_det_trt_cuda_python.py
ONNX转换为WTS文件和转换为Engine文件有一些区别。
- WTS文件是权重文件,其中包含神经网络的所有参数,但不包括网络结构。可以在TensorRT中使用WTS文件来加载预先训练好的权重。
- Engine文件是TensorRT的序列化模型,其中包含神经网络的结构和权重。可以在TensorRT中使用Engine文件来进行推理。
因此,如果您只需要加载预先训练好的权重,则可以将ONNX转换为WTS文件。如果您需要进行推理,则应将ONNX转换为Engine文件。
参考: