首先,mmpose是top-down的关键点检测算法,因此需要前置一个目标检测器,这里我们选用openmmlab官方的mmdetection代码仓库。
(一)MMpose安装以及环境配置
1:在openmmlab的github官网下载mmpose源码:
https://github.com/open-mmlab/mmpose
同样下载mmdetection的源码 :
https://github.com/open-mmlab/mmdetection
2,下载完代码后需要创建一个新的conda虚拟环境,这里我创建的是openmmlab的虚拟环境,并且需要在虚拟环境中安装以下依赖环境包:
(1)安装pytoch
pip3 install install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
(2)用MM命令安装MMCV
pip install -U openmim
mim install mmengine
mim install 'mmcv>=2.0.0rc3'
mim install "mmdet>=3.0.0rc6"
mim install "mmpose>=1.1.0"
(3)安装其他工具包
pip install opencv-python pillow matplotlib seaborn tqdm pycocotools -i https://pypi.tuna.tsinghua.edu.cn/simple
3,需要在两个代码仓库中创建checkpoint文件夹与outputs文件夹和data文件夹
# 创建 checkpoint 文件夹,用于存放预训练模型权重文件
os.mkdir('checkpoint')
# 创建 outputs 文件夹,用于存放预测结果
os.mkdir('outputs')
# 创建 data 文件夹,用于存放图片和视频素材
os.mkdir('data')
os.mkdir('data/test')
4,检查是否安装成功
在openmmlab环境中,创建python文件,运行没有报错即为安装成功
import torch, torchvision
import mmcv
import mmpose
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
# 检查 Pytorch
print('Pytorch 版本', torch.__version__)
print('CUDA 是否可用',torch.cuda.is_available())
# 检查 mmcv
print('MMCV版本', mmcv.__version__)
print('CUDA版本', get_compiling_cuda_version())
print('编译器版本', get_compiler_version())
# 检查 mmpose
print('mmpose版本', mmpose.__version__)
(二)我们需要准备自己的数据集,在自己的数据集是COCO格式的数据集
这里可以用labelme标注框与点,然后通过转换脚本来转成coco格式的数据集并将数据集放入我们创建好的mmpose与mmdetection中的data文件夹中
(三) 下载config配置文件
mmpose中的数据集路径指定,预训练模型的指定,训练超参数,模型权重等等都是在config文件中指定。所以需要在data文件夹下添加config文件,mmpose和mmdetection文件夹中都需要。
(1)faster-rcnn目标检测config文件下载地址:
https://zihao-openmmlab.obs.myhuaweicloud.com/20220610-mmpose/triangle_dataset/faster_r_cnn_triangle.py
config文件如下:
# 数据集类型及路径
dataset_type = 'CocoDataset'
data_root = 'data/Triangle_215_Keypoint_coco/'
metainfo = {'classes': ('sjb_rect',)}
NUM_CLASSES = len(metainfo['classes'])
# 预训练模型权重
load_from = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
# 训练超参数
MAX_EPOCHS = 50
TRAIN_BATCH_SIZE = 1
VAL_BATCH_SIZE = 1
VAL_INTERVAL = 5 # 每隔多少轮评估保存一次模型权重
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=MAX_EPOCHS, val_interval=VAL_INTERVAL)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
# Pipeline
backend_args = None
train_pipeline = [
dict(type='LoadImageFromFile', backend_args=backend_args),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', prob=0.5),
dict(type='PackDetInputs')
]
test_pipeline = [
dict(type='LoadImageFromFile', backend_args=backend_args),
dict(type='Resize', scale=(1333, 800), keep_ratio=True),
# If you don't have a gt annotation, delete the pipeline
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
# DataLoader
train_dataloader = dict(
batch_size=TRAIN_BATCH_SIZE,
num_workers=2,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
batch_sampler=dict(type='AspectRatioBatchSampler'),
dataset=dict(
type=dataset_type,
data_root=data_root,
metainfo=metainfo,
ann_file='train_coco.json',
data_prefix=dict(img='images/'),
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=train_pipeline,
backend_args=backend_args))
val_dataloader = dict(
batch_size=VAL_BATCH_SIZE,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
metainfo=metainfo,
data_root=data_root,
ann_file='val_coco.json',
data_prefix=dict(img='images/'),
test_mode=True,
pipeline=test_pipeline,
backend_args=backend_args))
test_dataloader = val_dataloader
# Evaluator 测试集上的评估指标
# val_evaluator = dict(type='CocoMetric',ann_file=data_root + 'val_coco.json',metric='bbox',format_only=False, backend_args=backend_args)
# val_evaluator = dict(type='VOCMetric', metric='mAP', eval_mode='11points')
val_evaluator = [
dict(type='CocoMetric',ann_file=data_root + 'val_coco.json',metric='bbox',format_only=False, backend_args=backend_args),
dict(type='VOCMetric', metric='mAP', eval_mode='11points')
]
test_evaluator = val_evaluator
# 模型结构
model = dict(
type='FasterRCNN',
data_preprocessor=dict(
type='DetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_size_divisor=32),
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=NUM_CLASSES,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)
# soft-nms is also supported for rcnn testing
# e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
))
# 学习率
param_scheduler = [
dict(
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500),
dict(
type='MultiStepLR',
begin=0,
end=12,
by_epoch=True,
milestones=[8, 11],
gamma=0.1)
]
# 优化器
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001))
# Scaling LR automatically
auto_scale_lr = dict(enable=False, base_batch_size=16)
default_scope = 'mmdet'
# Hook
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=1),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=2, save_best='coco/bbox_mAP'), # auto coco/bbox_mAP_50 coco/bbox_mAP_75 coco/bbox_mAP_s
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='DetVisualizationHook'))
env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl'),
)
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer')
log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)
log_level = 'INFO'
load_from = None
resume = False
如果训练自己的数据集,则需要修改以下这三行:
# 数据集类型及路径
dataset_type = 'CocoDataset'
data_root = 'data/Triangle_215_Keypoint_coco/'
metainfo = {'classes': ('sjb_rect',)}
NUM_CLASSES = len(metainfo['classes'])
dataset_type是数据集的格式,这里我们使用的是coco格式的数据集,因此dataset_type是CocoDataset。
data_root是数据集的路径,这里填自己的数据集路径即可。
metainfo 是框的类别名称,这里填写标注的名称即可。
最后可以改一下超参数中的batch-size,根据显存大小来决定。
(四)在完成上述操作后,我们可以开始训练了,训练要先训练mmdetection目标检测器。建议通过命令行来运行,其实就是执行tools/train.py训练脚本,训练命令如下:
(1)faster-rcnn模型的训练(预测速度较慢,不建议使用)
python tools/train.py data/faster_r_cnn_triangle.py
(2)RTM-det模型的训练(预测速度较快且较为精准,建议使用)
python tools/train.py data/rtmdet_tiny_triangle.py
(3)在完成训练后会生成workdir文件夹,该文件夹中包含训练日志以及模型的pth文件等。最后我们需要对训练完成的目标检测器进行测试,测试采用tools/test.py这个脚本,需要指定cfg文件以及训练完成的模型pth文件。测试命令如下:
python tools/test.py data/rtmdet_tiny_triangle.py \
work_dirs/rtmdet_tiny_triangle/epoch_200.pth
可以使用以下脚本可视化训练日志
import pandas as pd
from tqdm import tqdm
from matplotlib import colors as mcolors
import random
import matplotlib.pyplot as plt
plt.rcParams['axes.unicode_minus']=False # 用来正常显示负号
# 日志文件路径
log_path = 'work_dirs/faster_r_cnn_triangle/20230511_234855/vis_data/scalars.json'
with open(log_path, "r") as f:
json_list = f.readlines()
len(json_list)
print(eval(json_list[4]))
df_train = pd.DataFrame()
df_test = pd.DataFrame()
for each in tqdm(json_list):
if 'coco/bbox_mAP' in each:
df_test = df_test.append(eval(each), ignore_index=True)
else:
df_train = df_train.append(eval(each), ignore_index=True)
print(df_train)
print(df_test)
#导出训练日志表格
df_train.to_csv('训练日志-训练集.csv', index=False)
df_test.to_csv('训练日志-测试集.csv', index=False)
#设置Matplotlib中文字体
# # windows操作系统
# plt.rcParams['font.sans-serif']=['SimHei'] # 用来正常显示中文标签
# plt.rcParams['axes.unicode_minus']=False # 用来正常显示负号
# Linux操作系统
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rc("font",family='SimHei') # 中文字体
#测试中文字体是否设置成功
plt.plot([1,2,3], [100,500,300])
plt.title('matplotlib中文字体测试', fontsize=25)
plt.xlabel('X轴', fontsize=15)
plt.ylabel('Y轴', fontsize=15)
plt.show()
#可视化辅助函数
random.seed(124)
colors = ['b', 'g', 'r', 'c', 'm', 'y', 'k', 'tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 'tab:brown', 'tab:pink', 'tab:gray', 'tab:olive', 'tab:cyan', 'black', 'indianred', 'brown', 'firebrick', 'maroon', 'darkred', 'red', 'sienna', 'chocolate', 'yellow', 'olivedrab', 'yellowgreen', 'darkolivegreen', 'forestgreen', 'limegreen', 'darkgreen', 'green', 'lime', 'seagreen', 'mediumseagreen', 'darkslategray', 'darkslategrey', 'teal', 'darkcyan', 'dodgerblue', 'navy', 'darkblue', 'mediumblue', 'blue', 'slateblue', 'darkslateblue', 'mediumslateblue', 'mediumpurple', 'rebeccapurple', 'blueviolet', 'indigo', 'darkorchid', 'darkviolet', 'mediumorchid', 'purple', 'darkmagenta', 'fuchsia', 'magenta', 'orchid', 'mediumvioletred', 'deeppink', 'hotpink']
markers = [".",",","o","v","^","<",">","1","2","3","4","8","s","p","P","*","h","H","+","x","X","D","d","|","_",0,1,2,3,4,5,6,7,8,9,10,11]
linestyle = ['--', '-.', '-']
def get_line_arg():
'''
随机产生一种绘图线型
'''
line_arg = {}
line_arg['color'] = random.choice(colors)
# line_arg['marker'] = random.choice(markers)
line_arg['linestyle'] = random.choice(linestyle)
line_arg['linewidth'] = random.randint(1, 4)
# line_arg['markersize'] = random.randint(3, 5)
return line_arg
#训练集损失函数
print(df_train.columns)
metrics = ['loss', 'loss_bbox', 'loss_cls', 'loss_rpn_cls', 'loss_rpn_bbox']
plt.figure(figsize=(16, 8))
x = df_train['step']
for y in metrics:
plt.plot(x, df_train[y], label=y, **get_line_arg())
plt.tick_params(labelsize=20)
plt.xlabel('step', fontsize=20)
plt.ylabel('loss', fontsize=20)
plt.title('训练集损失函数', fontsize=25)
plt.savefig('训练集损失函数.pdf', dpi=120, bbox_inches='tight')
plt.legend(fontsize=20)
plt.show()
#训练集准确率
metrics = ['acc']
plt.figure(figsize=(16, 8))
x = df_train['step']
for y in metrics:
plt.plot(x, df_train[y], label=y, **get_line_arg())
plt.tick_params(labelsize=20)
plt.xlabel('step', fontsize=20)
plt.ylabel('loss', fontsize=20)
plt.title('训练集准确率', fontsize=25)
plt.savefig('训练集准确率.pdf', dpi=120, bbox_inches='tight')
plt.legend(fontsize=20)
plt.show()
#测试集评估指标-MS COCO Metric
print(df_test.columns)
metrics = ['coco/bbox_mAP', 'coco/bbox_mAP_50', 'coco/bbox_mAP_75', 'coco/bbox_mAP_s', 'coco/bbox_mAP_m', 'coco/bbox_mAP_l']
plt.figure(figsize=(16, 8))
x = df_test['step']
for y in metrics:
plt.plot(x, df_test[y], label=y, **get_line_arg())
plt.tick_params(labelsize=20)
# plt.ylim([0, 100])
plt.xlabel('Epoch', fontsize=20)
plt.ylabel(y, fontsize=20)
plt.title('测试集评估指标', fontsize=25)
plt.savefig('测试集分类评估指标.pdf', dpi=120, bbox_inches='tight')
plt.legend(fontsize=20)
plt.show()
#测试集评估指标-PASCAL VOC Metric
metrics = ['pascal_voc/mAP', 'pascal_voc/AP50']
plt.figure(figsize=(16, 8))
x = df_test['step']
for y in metrics:
plt.plot(x, df_test[y], label=y, **get_line_arg())
plt.tick_params(labelsize=20)
# plt.ylim([0, 100])
plt.xlabel('Epoch', fontsize=20)
plt.ylabel(y, fontsize=20)
plt.title('测试集评估指标', fontsize=25)
plt.savefig('测试集分类评估指标.pdf', dpi=120, bbox_inches='tight')
plt.legend(fontsize=20)
plt.show()
(五)在完成上述操作后,我们需要对模型进行精简转换,经过精简转换之后,模型.pth权重文件大小缩小为原来的一半以上,但不影响推理结果和推理速度。具体是通过python tools/model_converters/publish_model.py 这个脚本来实现的,需要指定模型的pth文件以及config文件。执行命令如下:
(1)faster-rcnn的模型精简转换脚本命令
python tools/model_converters/publish_model.py \
work_dirs/faster_r_cnn_triangle/epoch_50.pth \
checkpoint/faster_r_cnn_triangle_epoch_50_202305120846.pth
(2)RTMdet-tiny的模型精简转换脚本命令
python tools/model_converters/publish_model.py \
work_dirs/rtmdet_tiny_triangle/epoch_200.pth \
checkpoint/rtmdet_tiny_triangle_epoch_200_202305120847.pth
(六)下载mmpose的congfig文件,需要将该文件放在mmpose的data目录下。下载网址如下:
https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20220610-mmpose/triangle_dataset/rtmpose-s_triangle_8xb256-420e_coco-256x192.py
下子完成后同样需要修改前三行和batch-size 和训练元数据,mmpose的config文件如下:
_base_ = ['mmpose::_base_/default_runtime.py']
# 数据集类型及路径
dataset_type = 'CocoDataset'
data_mode = 'topdown'
data_root = 'data/Triangle_215_Keypoint_coco/'
# 三角板关键点检测数据集-元数据
dataset_info = {
'dataset_name':'Triangle_215_Keypoint_coco',
'classes':'sjb_rect',
'paper_info':{
'author':'Tongji Zihao',
'title':'Triangle Keypoints Detection',
'container':'OpenMMLab',
'year':'2023',
'homepage':'https://space.bilibili.com/1900783'
},
'keypoint_info':{
0:{'name':'angle_30','id':0,'color':[255,0,0],'type': '','swap': ''},
1:{'name':'angle_60','id':1,'color':[0,255,0],'type': '','swap': ''},
2:{'name':'angle_90','id':2,'color':[0,0,255],'type': '','swap': ''}
},
'skeleton_info': {
0: {'link':('angle_30','angle_60'),'id': 0,'color': [100,150,200]},
1: {'link':('angle_60','angle_90'),'id': 1,'color': [200,100,150]},
2: {'link':('angle_90','angle_30'),'id': 2,'color': [150,120,100]}
},
'joint_weights':[1.0, 1.0, 1.0], #对这三个关键点的重视程度是否一样,这里可以自行修改
'sigmas':[0.026,0.025,0.025] #衡量不同的人在标注过程中的偏差,保持默认即可
}
NUM_KEYPOINTS = len(dataset_info['keypoint_info'])
# 训练超参数
max_epochs = 300 # 训练 epoch 总数
val_interval = 10 # 每隔多少个 epoch 保存一次权重文件
train_cfg = {'max_epochs': max_epochs, 'val_interval': val_interval}
train_batch_size = 16
val_batch_size = 8
stage2_num_epochs = 0
base_lr = 4e-3
randomness = dict(seed=21)
# 优化器
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05),
paramwise_cfg=dict(
norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))
# 学习率
param_scheduler = [
dict(
type='LinearLR', start_factor=1.0e-5, by_epoch=False, begin=0, end=20),
dict(
# use cosine lr from 210 to 420 epoch
type='CosineAnnealingLR',
eta_min=base_lr * 0.05,
begin=max_epochs // 2,
end=max_epochs,
T_max=max_epochs // 2,
by_epoch=True,
convert_to_iter_based=True),
]
# automatically scaling LR based on the actual training batch size
auto_scale_lr = dict(base_batch_size=1024)
# codec settings
codec = dict(
type='SimCCLabel',
input_size=(256, 256), #重要参数
sigma=(12, 12),#重要参数
simcc_split_ratio=2.0,
normalize=False,
use_dark=False)
# 不同输入图像尺寸的参数搭配
# input_size=(256, 256),
# sigma=(12, 12)
# in_featuremap_size=(8, 8)
# input_size可以换成 256、384、512、1024,三个参数等比例缩放
# sigma 表示关键点一维高斯分布的标准差,越大越容易学习,但精度上限会降低,越小越严格,对于人体、人脸等高精度场景,可以调小,RTMPose 原始论文中为 5.66 调整tinputsize要等比例缩放sigma和 in_featuremap_size
# 不同模型的 config: https://github.com/open-mmlab/mmpose/tree/dev-1.x/projects/rtmpose/rtmpose/body_2d_keypoint
## 模型:RTMPose-S
model = dict(
type='TopdownPoseEstimator',
data_preprocessor=dict(
type='PoseDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True),
backbone=dict(
_scope_='mmdet',
type='CSPNeXt',
arch='P5',
expand_ratio=0.5,
deepen_factor=0.67,
widen_factor=0.75,
out_indices=(4, ),
channel_attention=True,
norm_cfg=dict(type='SyncBN'),
act_cfg=dict(type='SiLU'),
init_cfg=dict(
type='Pretrained',
prefix='backbone.',
checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' +
'rtmdet/cspnext_rsb_pretrain/cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth', # noqa E501
)),
head=dict(
type='RTMCCHead',
in_channels=768,
out_channels=NUM_KEYPOINTS,
input_size=codec['input_size'],
in_featuremap_size=(8, 8), #重要参数
simcc_split_ratio=codec['simcc_split_ratio'],
final_layer_kernel_size=7,
gau_cfg=dict(
hidden_dims=256,
s=128,
expansion_factor=2,
dropout_rate=0.,
drop_path=0.,
act_fn='SiLU',
use_rel_bias=False,
pos_enc=False),
loss=dict(
type='KLDiscretLoss',
use_target_weight=True,
beta=10.,
label_softmax=True),
decoder=codec),
test_cfg=dict(flip_test=True))
## 模型:RTMPose-M
# model = dict(
# type='TopdownPoseEstimator',
# data_preprocessor=dict(
# type='PoseDataPreprocessor',
# mean=[123.675, 116.28, 103.53],
# std=[58.395, 57.12, 57.375],
# bgr_to_rgb=True),
# backbone=dict(
# _scope_='mmdet',
# type='CSPNeXt',
# arch='P5',
# expand_ratio=0.5,
# deepen_factor=0.67,
# widen_factor=0.75,
# out_indices=(4, ),
# channel_attention=True,
# norm_cfg=dict(type='SyncBN'),
# act_cfg=dict(type='SiLU'),
# init_cfg=dict(
# type='Pretrained',
# prefix='backbone.',
# checkpoint='https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth'
# )),
# head=dict(
# type='RTMCCHead',
# in_channels=768,
# out_channels=NUM_KEYPOINTS,
# input_size=codec['input_size'],
# in_featuremap_size=(8, 8),
# simcc_split_ratio=codec['simcc_split_ratio'],
# final_layer_kernel_size=7,
# gau_cfg=dict(
# hidden_dims=256,
# s=128,
# expansion_factor=2,
# dropout_rate=0.,
# drop_path=0.,
# act_fn='SiLU',
# use_rel_bias=False,
# pos_enc=False),
# loss=dict(
# type='KLDiscretLoss',
# use_target_weight=True,
# beta=10.,
# label_softmax=True),
# decoder=codec),
# test_cfg=dict(flip_test=True))
## 模型:RTMPose-L
# model = dict(
# type='TopdownPoseEstimator',
# data_preprocessor=dict(
# type='PoseDataPreprocessor',
# mean=[123.675, 116.28, 103.53],
# std=[58.395, 57.12, 57.375],
# bgr_to_rgb=True),
# backbone=dict(
# _scope_='mmdet',
# type='CSPNeXt',
# arch='P5',
# expand_ratio=0.5,
# deepen_factor=1.,
# widen_factor=1.,
# out_indices=(4, ),
# channel_attention=True,
# norm_cfg=dict(type='SyncBN'),
# act_cfg=dict(type='SiLU'),
# init_cfg=dict(
# type='Pretrained',
# prefix='backbone.',
# checkpoint='https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth'
# )),
# head=dict(
# type='RTMCCHead',
# in_channels=1024,
# out_channels=NUM_KEYPOINTS,
# input_size=codec['input_size'],
# in_featuremap_size=(8, 8),
# simcc_split_ratio=codec['simcc_split_ratio'],
# final_layer_kernel_size=7,
# gau_cfg=dict(
# hidden_dims=256,
# s=128,
# expansion_factor=2,
# dropout_rate=0.,
# drop_path=0.,
# act_fn='SiLU',
# use_rel_bias=False,
# pos_enc=False),
# loss=dict(
# type='KLDiscretLoss',
# use_target_weight=True,
# beta=10.,
# label_softmax=True),
# decoder=codec),
# test_cfg=dict(flip_test=True))
backend_args = dict(backend='local')
# backend_args = dict(
# backend='petrel',
# path_mapping=dict({
# f'{data_root}': 's3://openmmlab/datasets/detection/coco/',
# f'{data_root}': 's3://openmmlab/datasets/detection/coco/'
# }))
# pipelines
train_pipeline = [
dict(type='LoadImage', backend_args=backend_args),
dict(type='GetBBoxCenterScale'),
dict(type='RandomFlip', direction='horizontal'),
# dict(type='RandomHalfBody'),
dict(
type='RandomBBoxTransform', scale_factor=[0.8, 1.2], rotate_factor=30),
dict(type='TopdownAffine', input_size=codec['input_size']),
dict(type='mmdet.YOLOXHSVRandomAug'),
dict(
type='Albumentation',
transforms=[
dict(type='ChannelShuffle', p=0.5),
dict(type='CLAHE', p=0.5),
# dict(type='Downscale', scale_min=0.7, scale_max=0.9, p=0.2),
dict(type='ColorJitter', p=0.5),
dict(
type='CoarseDropout',
max_holes=4,
max_height=0.3,
max_width=0.3,
min_holes=1,
min_height=0.2,
min_width=0.2,
p=0.5),
]),
dict(type='GenerateTarget', encoder=codec),
dict(type='PackPoseInputs')
]
# train_pipeline = [
# dict(type='LoadImage', backend_args=backend_args),
# dict(type='GetBBoxCenterScale'),
# dict(type='TopdownAffine', input_size=codec['input_size']),
# dict(type='GenerateTarget', encoder=codec),
# dict(type='PackPoseInputs')
# ]
val_pipeline = [
dict(type='LoadImage', backend_args=backend_args),
dict(type='GetBBoxCenterScale'),
dict(type='TopdownAffine', input_size=codec['input_size']),
dict(type='PackPoseInputs')
]
train_pipeline_stage2 = [
dict(type='LoadImage', backend_args=backend_args),
dict(type='GetBBoxCenterScale'),
dict(type='RandomFlip', direction='horizontal'),
dict(type='RandomHalfBody'),
dict(
type='RandomBBoxTransform',
shift_factor=0.,
scale_factor=[0.75, 1.25],
rotate_factor=60),
dict(type='TopdownAffine', input_size=codec['input_size']),
dict(type='mmdet.YOLOXHSVRandomAug'),
dict(
type='Albumentation',
transforms=[
dict(type='Blur', p=0.1),
dict(type='MedianBlur', p=0.1),
dict(
type='CoarseDropout',
max_holes=1,
max_height=0.4,
max_width=0.4,
min_holes=1,
min_height=0.2,
min_width=0.2,
p=0.5),
]),
dict(type='GenerateTarget', encoder=codec),
dict(type='PackPoseInputs')
]
# data loaders
train_dataloader = dict(
batch_size=train_batch_size,
num_workers=1,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
metainfo=dataset_info,
data_mode=data_mode,
ann_file='train_coco.json',
data_prefix=dict(img='images/'),
pipeline=train_pipeline,
))
val_dataloader = dict(
batch_size=val_batch_size,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
# dataset=dict(
# type=dataset_type,
# data_root=data_root,
# metainfo=dataset_info,
# data_mode=data_mode,
# ann_file='val_coco.json',
# # bbox_file=f'{data_root}person_detection_results/'
# # 'COCO_val2017_detections_AP_H_56_person.json',
# data_prefix=dict(img='images/'),
# test_mode=True,
# pipeline=val_pipeline,
# )
dataset=dict(
type=dataset_type,
data_root=data_root,
metainfo=dataset_info,
data_mode=data_mode,
ann_file='val_coco.json',
data_prefix=dict(img='images/'),
pipeline=val_pipeline,
))
test_dataloader = val_dataloader
# hooks
# default_hooks = dict(
# checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1),
# logger=dict(interval=1),
# )
# default_hooks = {
# 'checkpoint': {'save_best': 'coco/AP','rule': 'greater','max_keep_ckpts': 1},
# 'logger': {'interval': 1}
# }
default_hooks = {
'checkpoint': {'save_best': 'PCK','rule': 'greater','max_keep_ckpts': 2},
'logger': {'interval': 1}
}
custom_hooks = [
dict(
type='EMAHook',
ema_type='ExpMomentumEMA',
momentum=0.0002,
update_buffers=True,
priority=49),
dict(
type='mmdet.PipelineSwitchHook',
switch_epoch=max_epochs - stage2_num_epochs,
switch_pipeline=train_pipeline_stage2)
]
# evaluators
# val_evaluator = dict(type='CocoMetric', ann_file=data_root + 'val_coco.json')
# val_evaluator = dict(type='PCKAccuracy')
# val_evaluator = [dict(type='CocoMetric', ann_file=data_root + 'val_coco.json'), dict(type='PCKAccuracy')]
# val_evaluator = [
# dict(type='CocoMetric', ann_file=data_root + 'val_coco.json'),
# dict(type='PCKAccuracy'),
# dict(type='AUC'),
# dict(type='NME', norm_mode='keypoint_distance')
# ]
val_evaluator = [
dict(type='CocoMetric', ann_file=data_root + 'val_coco.json'),
dict(type='PCKAccuracy'),
dict(type='AUC'),
dict(type='NME', norm_mode='keypoint_distance', keypoint_indices=[1, 2])
]
test_evaluator = val_evaluator
(七)训练RTM-pose关键点检测算法
训练同样选用命令行的方式,需要指定config文件
python tools/train.py data/rtmpose-s_triangle_8xb256-420e_coco-256x192.py
训练完成后可以在测试集上评估精度,脚本命令如下:需要指定config文件以及训练完成的模型pth文件。
python tools/test.py data/rtmdet_tiny_triangle.py \
work_dirs/rtmdet_tiny_triangle/epoch_200.pth
(八)RTM-pose可视化训练日志脚本
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
plt.rcParams['axes.unicode_minus']=False # 用来正常显示负号
#载入训练日志
# 日志文件路径
log_path = 'work_dirs/rtmpose-s_triangle_8xb256-420e_coco-256x192/20230512_091723/vis_data/scalars.json'
with open(log_path, "r") as f:
json_list = f.readlines()
len(json_list)
eval(json_list[4])
df_train = pd.DataFrame()
df_test = pd.DataFrame()
for each in tqdm(json_list):
if 'coco/AP' in each:
df_test = df_test.append(eval(each), ignore_index=True)
else:
df_train = df_train.append(eval(each), ignore_index=True)
print(df_train)
print(df_test)
df_train.to_csv('训练日志-训练集.csv', index=False)
df_test.to_csv('训练日志-测试集.csv', index=False)
#导出训练日志表格
df_train.to_csv('训练日志-训练集.csv', index=False)
df_test.to_csv('训练日志-测试集.csv', index=False)
#设置Matplotlib中文字体
# # windows操作系统
# plt.rcParams['font.sans-serif']=['SimHei'] # 用来正常显示中文标签
# plt.rcParams['axes.unicode_minus']=False # 用来正常显示负号
# Linux操作系统
matplotlib.rc("font",family='SimHei') # 中文字体
#测试中文字体
plt.plot([1,2,3], [100,500,300])
plt.title('matplotlib中文字体测试', fontsize=25)
plt.xlabel('X轴', fontsize=15)
plt.ylabel('Y轴', fontsize=15)
plt.show()
#可视化辅助函数
from matplotlib import colors as mcolors
import random
random.seed(124)
colors = ['b', 'g', 'r', 'c', 'm', 'y', 'k', 'tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 'tab:brown', 'tab:pink', 'tab:gray', 'tab:olive', 'tab:cyan', 'black', 'indianred', 'brown', 'firebrick', 'maroon', 'darkred', 'red', 'sienna', 'chocolate', 'yellow', 'olivedrab', 'yellowgreen', 'darkolivegreen', 'forestgreen', 'limegreen', 'darkgreen', 'green', 'lime', 'seagreen', 'mediumseagreen', 'darkslategray', 'darkslategrey', 'teal', 'darkcyan', 'dodgerblue', 'navy', 'darkblue', 'mediumblue', 'blue', 'slateblue', 'darkslateblue', 'mediumslateblue', 'mediumpurple', 'rebeccapurple', 'blueviolet', 'indigo', 'darkorchid', 'darkviolet', 'mediumorchid', 'purple', 'darkmagenta', 'fuchsia', 'magenta', 'orchid', 'mediumvioletred', 'deeppink', 'hotpink']
markers = [".",",","o","v","^","<",">","1","2","3","4","8","s","p","P","*","h","H","+","x","X","D","d","|","_",0,1,2,3,4,5,6,7,8,9,10,11]
linestyle = ['--', '-.', '-']
def get_line_arg():
'''
随机产生一种绘图线型
'''
line_arg = {}
line_arg['color'] = random.choice(colors)
# line_arg['marker'] = random.choice(markers)
line_arg['linestyle'] = random.choice(linestyle)
line_arg['linewidth'] = random.randint(1, 4)
# line_arg['markersize'] = random.randint(3, 5)
return line_arg
#训练集损失函数
print(df_train.columns)
metrics = ['loss', 'loss_kpt']
plt.figure(figsize=(16, 8))
x = df_train['step']
for y in metrics:
plt.plot(x, df_train[y], label=y, **get_line_arg())
plt.tick_params(labelsize=20)
plt.xlabel('step', fontsize=20)
plt.ylabel('loss', fontsize=20)
plt.title('训练集损失函数', fontsize=25)
plt.savefig('训练集损失函数.pdf', dpi=120, bbox_inches='tight')
plt.legend(fontsize=20)
plt.show()
#训练集准确率
metrics = ['acc_pose']
plt.figure(figsize=(16, 8))
x = df_train['step']
for y in metrics:
plt.plot(x, df_train[y], label=y, **get_line_arg())
plt.tick_params(labelsize=20)
plt.xlabel('step', fontsize=20)
plt.ylabel('loss', fontsize=20)
plt.title('训练集准确率', fontsize=25)
plt.savefig('训练集准确率.pdf', dpi=120, bbox_inches='tight')
plt.legend(fontsize=20)
plt.show()
#测试集评估指标-MS COCO Metric
print(df_test.columns)
metrics = ['coco/AP', 'coco/AP .5', 'coco/AP .75', 'coco/AP (M)', 'coco/AP (L)', 'coco/AR', 'coco/AR .5', 'coco/AR .75', 'coco/AR (M)', 'coco/AR (L)', 'PCK', 'AUC']
plt.figure(figsize=(16, 8))
x = df_test['step']
for y in metrics:
plt.plot(x, df_test[y], label=y, **get_line_arg())
plt.tick_params(labelsize=20)
# plt.ylim([0, 100])
plt.xlabel('Epoch', fontsize=20)
plt.ylabel(y, fontsize=20)
plt.title('测试集评估指标', fontsize=25)
plt.savefig('测试集分类评估指标.pdf', dpi=120, bbox_inches='tight')
plt.legend(fontsize=20)
plt.show()
#测试集评估指标-NME
metrics = ['NME']
plt.figure(figsize=(16, 8))
x = df_test['step']
for y in metrics:
plt.plot(x, df_test[y], label=y, **get_line_arg())
plt.tick_params(labelsize=20)
# plt.ylim([0, 100])
plt.xlabel('Epoch', fontsize=20)
plt.ylabel(y, fontsize=20)
plt.title('测试集评估指标', fontsize=25)
plt.savefig('测试集分类评估指标.pdf', dpi=120, bbox_inches='tight')
plt.legend(fontsize=20)
plt.show()
(九)mmpose模型权重文件精简转换
模型权重文件精简转换采用的是tools/misc/publish_model.py脚本,需要指定训练完成的模型的pth文件以及config文件。具体转换命令如下:
python tools/misc/publish_model.py \
work_dirs/rtmpose-s_triangle_8xb256-420e_coco-256x192/epoch_300.pth \
checkpoint/rtmpose_s_triangle_300.pth
(十)使用精简转换后的模型进行预测,预测同样分为两个分支,一个是基于faster-rcnn的,另一个是基于rtm-pose的。
1:基于faster-rcnn的模型预测,预测命令如下:
python demo/topdown_demo_with_mmdet.py \
data/faster_r_cnn_triangle.py \
checkpoint/faster_r_cnn_triangle_epoch_50_202305120846-76d9dde3.pth \
data/rtmpose-s_triangle_8xb256-420e_coco-256x192.py \
checkpoint/rtmpose_s_triangle_300-34bfaeb2_20230512.pth \
--input data/test_triangle/triangle_7.jpg \
--output-root outputs/G2_Fasterrcnn-RTMPose \
--device cuda:0 \
--bbox-thr 0.5 \
--kpt-thr 0.5 \
--nms-thr 0.3 \
--radius 36 \
--thickness 30 \
--draw-bbox \
--draw-heatmap \
--show-kpt-idx
2:基于rtmdet的模型预测,预测命令如下:
python demo/topdown_demo_with_mmdet.py \
data/rtmdet_tiny_triangle.py \
checkpoint/rtmdet_tiny_triangle_epoch_200_202305120847-3cd02a8f.pth \
data/rtmpose-s_triangle_8xb256-420e_coco-256x192.py \
checkpoint/rtmpose_s_triangle_300-34bfaeb2_20230512.pth \
--input data/test_triangle/triangle_4.jpg \
--output-root outputs/G2_RTMDet-RTMPose \
--device cuda:0 \
--bbox-thr 0.5 \
--kpt-thr 0.5 \
--nms-thr 0.3 \
--radius 36 \
--thickness 30 \
--draw-bbox \
--draw-heatmap \
同样也可以输入视频进行预测,预测命令如下:该命令是基于rtmdet的。
python demo/topdown_demo_with_mmdet.py \
data/rtmdet_tiny_triangle.py \
checkpoint/rtmdet_tiny_triangle_epoch_200_202305120847-3cd02a8f.pth \
data/rtmpose-s_triangle_8xb256-420e_coco-256x192.py \
checkpoint/rtmpose_s_triangle_300-34bfaeb2_20230512.pth \
--input data/test_triangle/triangle_9.mp4 \
--output-root outputs/G2_Video \
--device cuda:0 \
--bbox-thr 0.5 \
--kpt-thr 0.5 \
--nms-thr 0.3 \
--radius 16 \
--thickness 10 \
--draw-bbox \
--draw-heatmap \
--show-kpt-idx