1、检测代码
代码源自datawhale官方提供baseline: https://github.com/datawhalechina/team-learning-cv/tree/master/DefectDetection
baseline使用的是yolov5,我的显卡只有一个1080Ti,所以先选择yolov5s进行训练,设置训练50个epoch, 图片大小设置为512x512。
这部分内容主要参考了https://blog.csdn.net/qq_26751117/article/details/113853150
- 数据处理:主要是将比赛方提供的数据格式转化为yolo需要的格式,先使用convertTrainLabel.py转化,然后在运行process_data_yolo.py,就得到了数据,存放位置为process_data文件夹;注意需要修改process_data_yolo中val字段,全部改为train字段,运行两次,分别得到验证和训练的数据文件。
- 预训练权重:尝试了一下不加载预训练权重,效果不是很好,可能是因为本来数据就比较少,还是需要进行迁移学习的。所以想办法下载了yolov5s.pt文件,进行了加载。由于模型比较小,可以设置较大的batch size, 这里是16。这里借上边那个文章的图,需要简单修改一下加载权重的部分。
- 运行,简单修改了一下一些报错的点,然后就可以运行了yolov5s模型了。
2、docker提交
我的dockerfile文件:
# Base Images
FROM registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch:1.4-cuda10.1-py3
ADD . /workspace
WORKDIR /workspace
RUN pip install -r requirements.txt
CMD ["sh", "run.sh"]
开始构建:
(torch16) pdluser@pdluser-System-Product-Name:~/project/tianchi_demo$ sudo docker build -t registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit:1.0 .
[sudo] pdluser 的密码:
Sending build context to Docker daemon 6.778GB
Step 1/5 : FROM registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch:1.4-cuda10.1-py3
---> 76c152fbfd03
Step 2/5 : ADD . /workspace
---> 10ca596f6d20
Step 3/5 : WORKDIR /workspace
---> Running in 37a88d04d2a9
Removing intermediate container 37a88d04d2a9
---> 7f7982fbfaba
Step 4/5 : RUN pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple --ignore-installed PyYAML
---> Running in 877004f83473
Looking in indexes: https://mirrors.aliyun.com/pypi/simple
Downloading https://mirrors.aliyun.com/pypi/packages/ec/d6/a82d191ec058314b2b7cbee5635150f754ba1c6ffc05387bc9a57efe48b8/cryptacular-1.5.5.tar.gz
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'done'
Collecting zope.sqlalchemy
Downloading https://mirrors.aliyun.com/pypi/packages/fa/83/459decec1dd2c14d60f9a360fff989c128abe545a1554a1da64b054a55d4/zope.sqlalchemy-1.3-py2.py3-none-any.whl
Collecting velruse>=1.0.3
Downloading https://mirrors.aliyun.com/pypi/packages/8f/0b/d47ea894587f3155f8c4520aa74d57c856189d0bbe27e831881d655a3386/PasteDeploy-2.1.1-py2.py3-none-any.whl
Building wheels for collected packages: cryptacular
Building wheel for cryptacular (PEP 517): started
Building wheel for cryptacular (PEP 517): finished with status 'done'
Created wheel for cryptacular: filename=cryptacular-1.5.5-cp37-abi3-manylinux2010_x86_64.whl size=52452 sha256=93037b68313c3d86df4c8cab9d0cc0866d1579cb7399410c7903b56eb2ff0067
Stored in directory: /root/.cache/pip/wheels/dd/c7/11/721f100da8477396b1f8fcfa2d23c801d5bac07d0e2d82dc0d
Successfully built cryptacular
Building wheels for collected packages: apex, velruse, pbkdf2, anykeystore
Building wheel for apex (setup.py): started
Building wheel for apex (setup.py): finished with status 'done'
Created wheel for apex: filename=apex-0.9.10.dev0-cp37-none-any.whl size=46468 sha256=c68745de219dd6169195cfec426e528cd5f5f932bd3cb7ddbc22817a9827cfea
Stored in directory: /root/.cache/pip/wheels/b8/f0/7a/2fc4cf8a70bfc0981f7009a2146685d06ee220398c0b780acf
Building wheel for velruse (setup.py): started
Building wheel for velruse (setup.py): finished with status 'done'
Created wheel for velruse: filename=velruse-1.1.1-cp37-none-any.whl size=50923 sha256=c300b70b745467b6b075bec09d6b2a11ab3524f6de31605431a62308613648e3
Stored in directory:
Successfully built apex velruse pbkdf2 anykeystore
Installing collected packages: PyYAML, Cython, numpy, opencv-python, typing-extensions, torch, pyparsing, kiwisolver, six, cycler, pillow
Removing intermediate container 877004f83473
---> 5c40d92c4bc1
Step 5/5 : CMD ["sh", "run.sh"]
---> Running in 41c2daf77fbc
Removing intermediate container 41c2daf77fbc
---> 603e3fe4452c
Successfully built 603e3fe4452c
Successfully tagged registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit:1.0
在构建完镜像以后,进入镜像:
先查看一下对应的ID:
pdluser@pdluser-System-Product-Name:~$ sudo docker images
[sudo] pdluser 的密码:
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit 1.0 b773b4e52e7a 4 minutes ago 11.2GB
<none> <none> f99df53cc33c 23 hours ago 7.92GB
registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch 1.4-cuda10.1-py3 76c152fbfd03 13 months ago 7.56GB
registry.cn-shanghai.aliyuncs.com/tcc-public/python 3 a4cc999cf2aa 21 months ago 929MB
进入第一个镜像,b7:
(torch16) pdluser@pdluser-System-Product-Name:~/project/tianchi_demo$ sudo docker run -it b7 /bin/bash
root@2a128d20af63:/workspace#
在这里运行run.sh,测试成功就可以提交了。
下一步将镜像推送到Registry:
$ sudo docker login --username=用户名 registry.cn-shenzhen.aliyuncs.com
$ sudo docker tag [ImageId] registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit:[镜像版本号]
$ sudo docker push registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit:[镜像版本号]
3、遇到的问题
在进行build的时候,发现以下问题,ERROR: Double requirement given: PyYAML>=5.3 (from -r requirements.txt (line 10)) (already in PyYAML, name=‘PyYAML’)
通过把yaml的等级要求去掉,就不会报错了。
用到opencv的时候也出现了报错:
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.7/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
解决方案是在dockerfile中添加以下内容:
RUN apt update
RUN apt install libgl1-mesa-glx
RUN apt-get install -y libglib2.0-0
但是会遇到以下问题:
这样改动dockfile,避免交互:
RUN DEBIAN_FRONTEND=noninteractive apt update -y
RUN DEBIAN_FRONTEND=noninteractive apt install libgl1-mesa-glx -y
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y libglib2.0-0
最终版本dockerfile:
# Base Images
FROM registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch:1.4-cuda10.1-py3
#registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch:1.4-cuda10.1-py3
ADD . /
WORKDIR /
RUN pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
RUN DEBIAN_FRONTEND=noninteractive apt update -y
RUN DEBIAN_FRONTEND=noninteractive apt install libgl1-mesa-glx -y
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y libglib2.0-0 -y
CMD ["sh", "run.sh"]
第一次提交出错:
啊,连续两次错误了。
之后调节了一下文件存放位置,和对应的命令,终于提交成功了,可喜可贺可喜可贺。
这个docker虽然很不错,但是入门还是有一定门槛的,我总结了一下使用过程中经常用到的知识点:https://blog.csdn.net/DD_PP_JJ/article/details/113902874 可以参考一下。