Bootstrap

深入浅出之TensorRT环境安装

一、所需要软件

本文操作均在Win10系统上完成,需要用到的软件与依赖包有:cuda 10.2 , cudnn 8.6.5 , VS2017 , OpenCV 4.0.0 , Anaconda3 , CMake 3.10.1 , TensorRT 8,pytorch

安装之前需要做如下:

1、显卡版本

通过英伟达官网

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.htmlicon-default.png?t=O83Ahttps://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

 来确认显卡驱动与CUDA版本对应关系,显卡驱动建议安装越新越好,基本上所有的版本cuda都能支持

2.显卡算力查询 

nvdia官方查询地址:https://developer.nvidia.com/cuda-gpusicon-default.png?t=O83Ahttps://developer.nvidia.com/cuda-gpus

 

3. cuda版本选择

cuda版本选择,主要考虑如下三个因素:

  • 是否稳定支持
  • 考虑Visual Studio软件版本
  • 考虑pytorch的版本

4、CUDA版本与Visual Studio软件版本关系

  • CUDA 12.x系列:虽然CUDA 12.x与Visual Studio 2019和2022兼容,但建议尽量使用最新版本的Visual Studio和CUDA Toolkit,以获得最佳的开发体验和性能支持。同时,也需要注意操作系统版本的要求,确保您的操作系统能够支持所安装的CUDA和Visual Studio版本。
  • CUDA 11.x系列:最好与Visual Studio 2019较新版本配合使用,包括VS 2019 version 16.4、16.5、16.7和16.8等。
  • CUDA 11.0:与VS 2017 Update 5兼容。
  • CUDA 10.2:与VS 2017 Update 3兼容。
  • CUDA 10.1:可以与Visual Studio 2019集成,但需要注意安装过程中勾选与VS相关的组件,如CUDA Visual Studio Integration,并确保安装的是VS 2019的支持CUDA的版本,如Enterprise、Professional或Community版。

5、 CUDA版本与Pytorch版本关系

通过pytorch官网https://pytorch.org/icon-default.png?t=O83Ahttps://pytorch.org/来确认


                        

6、常见的PyTorch与Python版本对应关系

以下是一些常见的PyTorch版本及其对应的Python版本:

PyTorch版本支持的Python版本
1.0.x 及更早版本Python 2.7, 3.5(注意:Python 2.7已在后续版本中不再支持)
1.1.xPython 3.6及以上版本
1.2.xPython 3.6及以上版本
1.3.xPython 3.6及以上版本
1.4.xPython 3.5至3.8(但推荐3.6及以上以获得更好的兼容性)
1.5.xPython 3.5至3.8
1.6.xPython 3.5至3.8
1.7.xPython 3.6至3.9
1.8.xPython 3.6至3.9
1.9.xPython 3.6至3.9
1.10.x及更高版本通常支持最新的几个Python版本(如3.6至3.9,具体取决于发布时的最新Python版本)

也可以通过pytorch官网查询

 7. cudnn版本和tensorrt版本

选择cudnn版本时要优先考虑tensorrt版本

1.1 显卡驱动安装

查看自己电脑显卡型号

NVIDIA 显卡驱动官网链接:https://www.nvidia.cn/Download/index.aspx?lang=cnicon-default.png?t=O83Ahttps://www.nvidia.cn/Download/index.aspx?lang=cn , 搜索就可以下载电脑相对应的英伟达显卡驱动

安装完之后,用

nvidia-smi

检测是否安装成功

 

 1.2 CUDA安装

通过nvidia官网下载cuda安装包https://developer.nvidia.com/cuda-downloadsicon-default.png?t=O83Ahttps://developer.nvidia.com/cuda-downloads

通过下载历史版本

 

下载后得到文件:cuda_10.2.89_441.22_win10.exe,点击安装即可,此处使用默认路径以方便后面配置路径

同时显卡驱动一栏可以不勾选,因为之前已经安装过了

安装完成后设置环境变量

右键点击此电脑,打开属性—>高级系统设置—>环境变量,可以看到系统变量中多了CUDA_PATH和CUDA_PATH_V10_2两个环境变量(CUDA默认安装位置路径为:C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2)

接下来,还需要在系统变量中添加以下五个变量:

CUDA_SDK_PATH = C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2
CUDA_LIB_PATH = %CUDA_PATH%\lib\x64
CUDA_BIN_PATH = %CUDA_PATH%\bin
CUDA_SDK_BIN_PATH = %CUDA_SDK_PATH%\bin\win64
CUDA_SDK_LIB_PATH = %CUDA_SDK_PATH%\common\lib\x64

 在系统变量中双击打开Path变量, 在其末尾添加如下指令路径:

%CUDA_LIB_PATH%;%CUDA_BIN_PATH%;%CUDA_SDK_LIB_PATH%;%CUDA_SDK_BIN_PATH%;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\extras\CUPTI\lib64
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2\bin\win64
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2\common\lib\x64

 安装测试

最后测试CUDA是否配置成功,打开CMD执行:

nvcc -V

即可看到如下图所示CUDA的信息,则配置成功

image-20210825004547300

1.3 CUDNN安装

CUDNN官方链接:https://developer.nvidia.com/cudnnicon-default.png?t=O83Ahttps://developer.nvidia.com/cudnn最新版本下载地址:https://developer.nvidia.com/cudnn-downloadsicon-default.png?t=O83Ahttps://developer.nvidia.com/cudnn-downloads

历史版本下载地址:https://developer.nvidia.com/cudnn-archiveicon-default.png?t=O83Ahttps://developer.nvidia.com/cudnn-archive

 

下载后得到文件:cudnn-10.2-windows10-x64-v8.6.0.163.zip

将压缩包文件解压打开,然后将cuda目录下的bin,include,lib中的文件分别复制粘贴到路径C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2文件夹下相对应的目录

 1.4  安装OpenCV

OpenCV官网:

http:// https://opencv.orgicon-default.png?t=O83Ahttp:// https://opencv.org

下载完成后,双击opencv-4.0.0-vc14_vc15.exe运行进行解压,将压缩包解压到相应目录(自定义路径),如:D:\Program Files (x86)\opencv,然后在系统变量Path的末尾添加:D:\Program Files (x86)\opencv\build\x64\vc15\bin,即完成安装


1.5 安装Anaconda3

官方下载地址

当前版本 https://www.anaconda.com/download/icon-default.png?t=O83Ahttps://link.zhihu.com/?target=https%3A//www.anaconda.com/download/历史版本 https://repo.anaconda.com/archive/icon-default.png?t=O83Ahttps://link.zhihu.com/?target=https%3A//repo.anaconda.com/archive/

镜像下载地址

清华镜像 https://mirrors.tuna.tsinghua.edu.cnicon-default.png?t=O83Ahttps://link.zhihu.com/?target=https%3A//mirrors.tuna.tsinghua.edu.cn/anaconda/archive/

详见《Anaconda介绍及发行版本说明icon-default.png?t=O83Ahttps://blog.csdn.net/a8039974/article/details/142677775?spm=1001.2014.3001.5501

安装python3.8版本对应的anaconda

1.6 安装pytorch

官方下载地址

当前版本:

PyTorchicon-default.png?t=O83Ahttps://pytorch.org/历史版本:

https://pytorch.org/get-started/previous-versions/icon-default.png?t=O83Ahttps://pytorch.org/get-started/previous-versions/当前选择版本pytorch1.11,采用如下命令安装

# CUDA 10.2
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=10.2 -c pytorch

# CUDA 10.2
pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu102/torch_stable.html

1.7 安装CMake

官网下载地址:Download CMakeicon-default.png?t=O83Ahttps://cmake.org/download/

CMake版本及使用详见《深入浅出之CMake工具及CMakefile文件icon-default.png?t=O83Ahttps://blog.csdn.net/a8039974/article/details/142820552?spm=1001.2014.3001.5501》 

1.8 安装Visual Studio 

官网下载

当前版本:

下载 Visual Studio Tools - 免费安装 Windows、Mac、Linux免费下载 Visual Studio IDE 或 VS Code。 在 Windows、Mac 上试用 Visual Studio Professional 或企业版。icon-default.png?t=O83Ahttps://visualstudio.microsoft.com/zh-hans/downloads/历史版本

Visual Studio 旧版下载 - 2019、2017、2015下载以前版本的 Visual Studio Community、Professional 和 Enterprise 软件。在此处登录到 Visual Studio (MSDN) 订阅。icon-default.png?t=O83Ahttps://visualstudio.microsoft.com/zh-hans/vs/older-downloads/

目前选择安装vs2017

1.9 TensorRT安装

官网下载

https://developer.nvidia.com/tensorrt/downloadicon-default.png?t=O83Ahttps://developer.nvidia.com/tensorrt/download选择Tensorrt 8

 环境配置

将压缩文件解压得到TensorRT-8.5.1.7的文件夹,将里边lib文件夹的绝对路径添加到环境变量中,即D:\TensorRT-8.5.1.7\lib

使用python接口的tensorrt时,需要安装pycuda包


                        
二、YOLO项目下载及安装

环境搭建详见《YOLO环境搭建icon-default.png?t=O83Ahttps://blog.csdn.net/a8039974/article/details/142678258?spm=1001.2014.3001.5501

 YOLO项目框架详见《深入浅出之Ultralytics框架icon-default.png?t=O83Ahttps://blog.csdn.net/a8039974/article/details/142765290?spm=1001.2014.3001.5501

三、TensorRT加速部署

(1) tensorrtx下载
对于tensorrtx,Github链接:https://github.com/wang-xinyu/tensorrtxicon-default.png?t=O83Ahttps://github.com/wang-xinyu/tensorrtx下载完成后,将压缩包解压

(2) dirent.h下载
Dirent 是一个 C/C++ 编程接口,允许程序员在 Linux/UNIX 下检索有关文件和目录的信息。 该项目为 Microsoft Windows 提供了 Linux 兼容的 Dirent 接口,Github链接:https://github.com/tronkko/direnticon-default.png?t=O83Ahttps://github.com/tronkko/dirent

下载后将dirent.h放到tensorrt下include里

(3) CMakeList.txt修改⭐
官方给出的CMakeList是linux版本的,想在Windows系统下运行项目需要修改CMakeList文件。具体修改完成后的内容如下:

复制粘贴即可用,需根据自己路径修改

cmake_minimum_required(VERSION 3.10)

project(yolov8 LANGUAGES CXX CUDA)

add_definitions(-std=c++11)
add_definitions(-DAPI_EXPORTS)
option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)

# setup CUDA
# if(POLICY CMP0146)
  # cmake_policy(SET CMP0146 OLD) 
# endif()
find_package(CUDA REQUIRED)
message(STATUS "    libraries: ${CUDA_LIBRARIES}")
message(STATUS "    include path: ${CUDA_INCLUDE_DIRS}")
if(CUDA_FOUND)  
  list(APPEND CUDA_NVCC_FLAGS "-std=c++11")
endif(CUDA_FOUND) 
include_directories(${CUDA_INCLUDE_DIRS})
 
####
enable_language(CUDA)  # add this line, then no need to setup cuda path in vs
####
#include_directories(${PROJECT_SOURCE_DIR}\\include)
#include_directories(${TRT_DIR}\\include)
# cuda
include_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/include)
link_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/lib64)
#tensorrt
include_directories(F:/MrAIPlatform/tensorrt/TensorRT-8.5.1.7/include)
link_directories(F:/MrAIPlatform/tensorrt/TensorRT-8.5.1.7/lib)

set(CMAKE_PREFIX_PATH F:/MrAIPlatform/depends/opencv4.0)
#find_package(OpenCV REQUIRED)
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})

include_directories(${PROJECT_SOURCE_DIR}/include)
include_directories(${PROJECT_SOURCE_DIR}/plugin)

# MESSAGE(STATUS "operation system is ${CMAKE_SYSTEM}") 
# IF (CMAKE_SYSTEM_NAME MATCHES "Linux")
    # MESSAGE(STATUS "current platform: Linux ")
    # set(CUDA_COMPILER_PATH "/usr/local/cuda/bin/nvcc")
    # set(TENSORRT_PATH "/home/benol/Package/TensorRT-8.6.1.6")
    # include_directories(/usr/local/cuda/include)
    # link_directories(/usr/local/cuda/lib64)
    # link_directories(/usr/local/cuda/lib)
# ELSEIF (CMAKE_SYSTEM_NAME MATCHES "Windows")
    # MESSAGE(STATUS "current platform: Windows")
    # set(CUDA_COMPILER_PATH "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/bin/nvcc.exe")
    # set(TENSORRT_PATH "F:\\MrAIPlatform\\tensorrt\\TensorRT-8.5.1.7")
    # set(OpenCV_DIR "F:\\MrAIPlatform\\depends\\opencv4.0")
    # include_directories(${PROJECT_SOURCE_DIR}/windows)
    # find_package(CUDA REQUIRED)
    # # cuda
    # include_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/include)
    # link_directories(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/lib64)
# ELSE (CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
    # MESSAGE(STATUS "other platform: ${CMAKE_SYSTEM_PROCESSOR}")
    # include_directories(/usr/local/cuda/targets/aarch64-linux/include)
    # link_directories(/usr/local/cuda/targets/aarch64-linux/lib)
# ENDIF (CMAKE_SYSTEM_NAME MATCHES "Linux")
# include and link dirs of cuda and tensorrt, you need adapt them if yours are different
# if (CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
  # message("embed_platform on")
  # include_directories(/usr/local/cuda/targets/aarch64-linux/include)
  # link_directories(/usr/local/cuda/targets/aarch64-linux/lib)
# else()
  # message("embed_platform off")
  # # cuda
  # include_directories(/usr/local/cuda/include)
  # link_directories(/usr/local/cuda/lib64)

  # # tensorrt
  # include_directories(/home/lindsay/TensorRT-8.4.1.5/include)
  # link_directories(/home/lindsay/TensorRT-8.4.1.5/lib)
  # #  include_directories(/home/lindsay/TensorRT-7.2.3.4/include)
  # #  link_directories(/home/lindsay/TensorRT-7.2.3.4/lib)


# endif()

# tensorrt
include_directories(${TENSORRT_PATH}/include)
link_directories(${TENSORRT_PATH}/lib)

find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})

add_library(myplugins SHARED ${PROJECT_SOURCE_DIR}/plugin/yololayer.cu)
target_link_libraries(myplugins nvinfer cudart)



file(GLOB_RECURSE SRCS ${PROJECT_SOURCE_DIR}/src/*.cpp ${PROJECT_SOURCE_DIR}/src/*.cu)
add_executable(yolov8_det ${PROJECT_SOURCE_DIR}/yolov8_det.cpp ${SRCS})

target_link_libraries(yolov8_det nvinfer)
target_link_libraries(yolov8_det cudart)
target_link_libraries(yolov8_det myplugins)
target_link_libraries(yolov8_det ${OpenCV_LIBS})

add_executable(yolov8_seg ${PROJECT_SOURCE_DIR}/yolov8_seg.cpp ${SRCS})
target_link_libraries(yolov8_seg nvinfer cudart myplugins ${OpenCV_LIBS})


add_executable(yolov8_pose ${PROJECT_SOURCE_DIR}/yolov8_pose.cpp ${SRCS})
target_link_libraries(yolov8_pose nvinfer cudart myplugins ${OpenCV_LIBS})

add_executable(yolov8_cls ${PROJECT_SOURCE_DIR}/yolov8_cls.cpp ${SRCS})
target_link_libraries(yolov8_cls nvinfer cudart myplugins ${OpenCV_LIBS})

add_executable(yolov8_5u_det ${PROJECT_SOURCE_DIR}/yolov8_5u_det.cpp ${SRCS})
target_link_libraries(yolov8_5u_det nvinfer cudart myplugins ${OpenCV_LIBS})


(4) tensorrtx编译运行
新建build文件夹,随后打开cmake-gui软件

确定好源代码路径和生成路径—>点击Configure并设置环境—>点击Finish,等待Configure done—>点击Generate并等待Generate done—>点击Open Project

打开项目后,使用Release x64平台生成解决方案,如下图所示,即生成成功

(6) tensorrtx加速命令使用

  • generate .wts from pytorch with .pt, or download .wts from model zoo
git clone -b v7.0 https://github.com/ultralytics/yolov5.git
git clone -b yolov5-v7.0 https://github.com/wang-xinyu/tensorrtx.git
cd yolov5/
wget https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt
cp [PATH-TO-TENSORRTX]/yolov5/gen_wts.py .
python gen_wts.py -w yolov5s.pt -o yolov5s.wts
# A file 'yolov5s.wts' will be generated.
  • build tensorrtx/yolov5 and run
cd [PATH-TO-TENSORRTX]/yolov5/
# Update kNumClass in src/config.h if your model is trained on custom dataset
mkdir build
cd build
cp [PATH-TO-ultralytics-yolov5]/yolov5s.wts . 
cmake ..
make

./yolov5_det -s [.wts] [.engine] [n/s/m/l/x/n6/s6/m6/l6/x6 or c/c6 gd gw]  // serialize model to plan file
./yolov5_det -d [.engine] [image folder]  // deserialize and run inference, the images in [image folder] will be processed.

# For example yolov5s
./yolov5_det -s yolov5s.wts yolov5s.engine s
./yolov5_det -d yolov5s.engine ../images

# For example Custom model with depth_multiple=0.17, width_multiple=0.25 in yolov5.yaml
./yolov5_det -s yolov5_custom.wts yolov5.engine c 0.17 0.25
./yolov5_det -d yolov5.engine ../images
  • Check the images generated, _zidane.jpg and _bus.jpg

  • Optional, load and run the tensorrt model in Python

// Install python-tensorrt, pycuda, etc.
// Ensure the yolov5s.engine and libmyplugins.so have been built
python yolov5_det_trt.py

// Another version of python script, which is using CUDA Python instead of pycuda.
python yolov5_det_trt_cuda_python.py

ONNX转换为WTS文件和转换为Engine文件有一些区别。 

- WTS文件是权重文件,其中包含神经网络的所有参数,但不包括网络结构。可以在TensorRT中使用WTS文件来加载预先训练好的权重。
- Engine文件是TensorRT的序列化模型,其中包含神经网络的结构和权重。可以在TensorRT中使用Engine文件来进行推理。

因此,如果您只需要加载预先训练好的权重,则可以将ONNX转换为WTS文件。如果您需要进行推理,则应将ONNX转换为Engine文件。

参考:

  1. Win10—YOLOv5实战+TensorRT部署+VS2019编译(小白教程~易懂易上手)---超详细
  2. 配置显卡驱动、CUDA、cuDNN以及说明三者之间的关系
;