Bootstrap

RT-DETR resume训练

1、应用背景说明

训练了100个epochs,觉得还有收敛的空间,便想要再加100个epochs,但是如果重新从1到200 epoch训练,则需要花上两倍的时间,本文目的是对RT-DETR从101到200 epoch继续训练。

2、修改代码

1、修改ultralytics/yolo/cfg/default.yaml文件

修改resume参数

resume: True 

增加ckpt_path参数

//权重文件位置
ckpt_path: runs/detect/train_100_epochs/weights/last.pt 

2、修改ultralytics/vit/rtdetr/model.py文件

修改train函数中的overrides[‘resume’]

def train(self, **kwargs):
      """
      Trains the model on a given dataset.

      Args:
          **kwargs (Any): Any number of arguments representing the training configuration.
      """
      overrides = dict(task='detect', mode='train')
      overrides.update(kwargs)
      overrides['deterministic'] = False
      if not overrides.get('data'):
          raise AttributeError("Dataset required but missing, i.e. pass 'data=coco128.yaml'")
      if overrides.get('resume'):
          //只改此处就可以了
          overrides['resume'] = overrides.get('ckpt_path')
          //
      self.task = overrides.get('task') or self.task
      self.trainer = RTDETRTrainer(overrides=overrides)
      if not overrides.get('resume'):  # manually set model only if not resuming
          self.trainer.model = self.trainer.get_model(weights=self.model if self.ckpt else None, cfg=self.model.yaml)
          self.model = self.trainer.model
      self.trainer.train()

3、修改权重文件参数

原训练最后一个epoch的权重文件last.pt中,有两个参数需要修改,一个是当前的训练周期epoch=-1,改成需要开始的序数100-1,也就是99。另一个是总的周期,epochs=100,改成继续训练的目标周期数200。


ckpt = torch.load('runs/detect/train_100_epochs/weights/last.pt')
//开始周期数
ckpt['epoch'] = 99
//目标周期数
ckpt['train_args']['epochs'] = 200
//保存权重文件
torch.save(ckpt, 'runs/detect/train_100_epochs/weights/last.pt')

4、继续训练


yolo cfg=ultralytics/yolo/cfg/default.yaml

;