RT-DETR resume 继续训练
1、应用背景说明
训练了100个epochs,觉得还有收敛的空间,便想要再加100个epochs,但是如果重新从1到200 epoch训练,则需要花上两倍的时间,本文目的是对RT-DETR从101到200 epoch继续训练。
2、修改代码
1、修改ultralytics/yolo/cfg/default.yaml文件
修改resume参数
resume: True
增加ckpt_path参数
//权重文件位置
ckpt_path: runs/detect/train_100_epochs/weights/last.pt
2、修改ultralytics/vit/rtdetr/model.py文件
修改train函数中的overrides[‘resume’]
def train(self, **kwargs):
"""
Trains the model on a given dataset.
Args:
**kwargs (Any): Any number of arguments representing the training configuration.
"""
overrides = dict(task='detect', mode='train')
overrides.update(kwargs)
overrides['deterministic'] = False
if not overrides.get('data'):
raise AttributeError("Dataset required but missing, i.e. pass 'data=coco128.yaml'")
if overrides.get('resume'):
//只改此处就可以了
overrides['resume'] = overrides.get('ckpt_path')
//
self.task = overrides.get('task') or self.task
self.trainer = RTDETRTrainer(overrides=overrides)
if not overrides.get('resume'): # manually set model only if not resuming
self.trainer.model = self.trainer.get_model(weights=self.model if self.ckpt else None, cfg=self.model.yaml)
self.model = self.trainer.model
self.trainer.train()
3、修改权重文件参数
原训练最后一个epoch的权重文件last.pt中,有两个参数需要修改,一个是当前的训练周期epoch=-1,改成需要开始的序数100-1,也就是99。另一个是总的周期,epochs=100,改成继续训练的目标周期数200。
ckpt = torch.load('runs/detect/train_100_epochs/weights/last.pt')
//开始周期数
ckpt['epoch'] = 99
//目标周期数
ckpt['train_args']['epochs'] = 200
//保存权重文件
torch.save(ckpt, 'runs/detect/train_100_epochs/weights/last.pt')
4、继续训练
yolo cfg=ultralytics/yolo/cfg/default.yaml