秋招面试专栏推荐 :深度学习算法工程师面试问题总结【百面算法工程师】——点击即可跳转
💡💡💡本专栏所有程序均经过测试,可成功执行💡💡💡
专栏目录: 《YOLOv5入门 + 改进涨点》专栏介绍 & 专栏目录 |目前已有50+篇内容,内含各种Head检测头、损失函数Loss、Backbone、Neck、NMS等创新点改进
本文介绍了一种基于注意力尺度序列融合的一阶段目标检测框架(ASF-YOLO),该框架结合了空间和尺度特征,用于准确快速的细胞实例分割。在YOLO分割框架的基础上,我们采用了尺度序列特征融合(SSFF)模块来增强网络的多尺度信息提取能力,以及三级特征编码(TFE)模块来融合不同尺度的特征图,以增加详细信息。我们进一步引入了通道和位置注意力机制(CPAM),以整合SSFF和TFE模块,该机制专注于信息丰富的通道和与空间位置相关的小物体,以提升检测和分割性能。文章在介绍主要的原理后,将手把手教学如何进行模块的代码添加和修改,并将修改后的完整代码放在文章的最后,方便大家一键运行,小白也可轻松上手实践。对于学有余力的同学,可以挑战进阶模块。文章内容丰富,可以帮助您更好地面对深度学习目标检测YOLO系列的挑战
*注:因为ASF-YOLO支持语义分割,因此用yolov5-7.0作为基础版本,其他文章若没有特别说明,仍然采用YOLOv5-6.1
目录
1.原理
官方代码:官方代码仓库——点击即可跳转
ASF-YOLO是一种基于YOLO(You Only Look Once)框架的改进模型,专门用于细胞实例分割。它通过结合空间和尺度特征,提供了更准确和快速的细胞实例分割。ASF-YOLO的主要原理包括以下几个方面:
1. YOLO框架的基础结构
ASF-YOLO基于YOLO框架,该框架包括三个主要部分:骨干网络(backbone)、颈部网络(neck)和头部网络(head)。其中,骨干网络负责在不同粒度上提取图像特征,颈部网络进行多尺度特征融合,头部网络则用于目标的边界框预测和分割掩码生成。
2. 规模序列特征融合模块(SSFF)
ASF-YOLO引入了SSFF模块,该模块通过归一化、上采样和多尺度特征的3D卷积,结合了不同尺度的全局语义信息。这种方法有效地处理了不同大小、方向和长宽比的物体,提升了分割性能。
3. 三重特征编码器模块(TFE)
TFE模块融合了不同尺度(大、中、小)的特征图,捕捉了小物体的细节信息。通过将这些详细特征整合到每个特征分支中,TFE模块增强了对密集细胞的小物体检测能力。
4. 通道和位置注意力机制(CPAM)
CPAM模块集成了来自SSFF和TFE模块的特征信息。它通过适应性地调整对相关通道和空间位置的关注,提高了小物体的检测和分割精度。
5. 高效的损失函数和后处理方法
ASF-YOLO在训练阶段使用了EIoU(Enhanced Intersection over Union)损失函数,该函数比传统的CIoU(Complete IoU)更能捕捉小物体的位置关系。此外,ASF-YOLO还在后处理阶段采用了软非极大值抑制(Soft-NMS),进一步改善了密集重叠细胞的检测问题。
结论
ASF-YOLO通过引入SSFF、TFE和CPAM模块,优化了YOLO框架在细胞实例分割中的性能,使其在处理小、密集和重叠物体时表现出色。这些创新使得ASF-YOLO在医学图像分析和细胞生物学领域具有广泛的应用潜力。
2. ASF-YOLO代码实现
2.1 将ASF-YOLO添加到YOLOv5中
关键步骤一:将下面代码添加到 yolov5/models/common.py中
import torch.nn.functional as F
class DownSample(nn.Module):
def __init__(self, c1, c2,k=2):
super(DownSample, self).__init__()
self.maxpool = nn.MaxPool2d(kernel_size=k, stride=k)
self.avgpool = nn.AvgPool2d(kernel_size=k, stride=k)
self.cv = Conv(c1, c2, 1, 1)
def forward(self, x):
x1 = self.maxpool(x)
x2 = self.avgpool(x)
x = x1 + x2
x = self.cv(x)
return x
class Zoom_cat(nn.Module):
def __init__(self, in_dim):
super().__init__()
#self.conv_l_post_down = Conv(in_dim, 2*in_dim, 3, 1, 1)
def forward(self, x):
"""l,m,s表示大中小三个尺度,最终会被整合到m这个尺度上"""
l, m, s = x[0], x[1], x[2]
tgt_size = m.shape[2:]
l = F.adaptive_max_pool2d(l, tgt_size) + F.adaptive_avg_pool2d(l, tgt_size)
#l = self.conv_l_post_down(l)
# m = self.conv_m(m)
# s = self.conv_s_pre_up(s)
s = F.interpolate(s, m.shape[2:], mode='nearest')
# s = self.conv_s_post_up(s)
lms = torch.cat([l, m, s], dim=1)
return lms
class ScalSeq(nn.Module):
def __init__(self, channel):
super(ScalSeq, self).__init__()
self.conv1 = Conv(512, channel,1)
self.conv2 = Conv(1024, channel,1)
self.conv3d = nn.Conv3d(channel,channel,kernel_size=(1,1,1))
self.bn = nn.BatchNorm3d(channel)
self.act = nn.LeakyReLU(0.1)
self.pool_3d = nn.MaxPool3d(kernel_size=(3,1,1))
def forward(self, x):
p3, p4, p5 = x[0],x[1],x[2]
p4_2 = self.conv1(p4)
p4_2 = F.interpolate(p4_2, p3.size()[2:], mode='nearest')
p5_2 = self.conv2(p5)
p5_2 = F.interpolate(p5_2, p3.size()[2:], mode='nearest')
p3_3d = torch.unsqueeze(p3, -3)
p4_3d = torch.unsqueeze(p4_2, -3)
p5_3d = torch.unsqueeze(p5_2, -3)
combine = torch.cat([p3_3d,p4_3d,p5_3d],dim = 2)
conv_3d = self.conv3d(combine)
bn = self.bn(conv_3d)
act = self.act(bn)
x = self.pool_3d(act)
x = torch.squeeze(x, 2)
return x
class Add(nn.Module):
# Concatenate a list of tensors along dimension
def __init__(self, ch = 256):
super().__init__()
def forward(self, x):
input1,input2 = x[0],x[1]
x = input1 + input2
return x
class channel_att(nn.Module):
def __init__(self, channel, b=1, gamma=2):
super(channel_att, self).__init__()
kernel_size = int(abs((math.log(channel, 2) + b) / gamma))
kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.conv = nn.Conv1d(1, 1, kernel_size=kernel_size, padding=(kernel_size - 1) // 2, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
y = self.avg_pool(x)
y = y.squeeze(-1)
y = y.transpose(-1, -2)
y = self.conv(y).transpose(-1, -2).unsqueeze(-1)
y = self.sigmoid(y)
return x * y.expand_as(x)
class local_att(nn.Module):
def __init__(self, channel, reduction=16):
super(local_att, self).__init__()
self.conv_1x1 = nn.Conv2d(in_channels=channel, out_channels=channel//reduction, kernel_size=1, stride=1, bias=False)
self.relu = nn.ReLU()
self.bn = nn.BatchNorm2d(channel//reduction)
self.F_h = nn.Conv2d(in_channels=channel//reduction, out_channels=channel, kernel_size=1, stride=1, bias=False)
self.F_w = nn.Conv2d(in_channels=channel//reduction, out_channels=channel, kernel_size=1, stride=1, bias=False)
self.sigmoid_h = nn.Sigmoid()
self.sigmoid_w = nn.Sigmoid()
def forward(self, x):
_, _, h, w = x.size()
x_h = torch.mean(x, dim = 3, keepdim = True).permute(0, 1, 3, 2)
x_w = torch.mean(x, dim = 2, keepdim = True)
x_cat_conv_relu = self.relu(self.bn(self.conv_1x1(torch.cat((x_h, x_w), 3))))
x_cat_conv_split_h, x_cat_conv_split_w = x_cat_conv_relu.split([h, w], 3)
s_h = self.sigmoid_h(self.F_h(x_cat_conv_split_h.permute(0, 1, 3, 2)))
s_w = self.sigmoid_w(self.F_w(x_cat_conv_split_w))
out = x * s_h.expand_as(x) * s_w.expand_as(x)
return out
class attention_model(nn.Module):
# Concatenate a list of tensors along dimension
def __init__(self, ch = 256):
super().__init__()
self.channel_att = channel_att(ch)
self.local_att = local_att(ch)
def forward(self, x):
input1,input2 = x[0],x[1]
input1 = self.channel_att(input1)
x = input1 + input2
x = self.local_att(x)
return x
2.2 新增yaml文件
关键步骤二:在下/yolov5-6.1/models下新建文件 yolov5_ASF.yaml并将下面代码复制进去
- Seg【分割】
# ASF-YOLO based on YOLOv5 by Ultralytics, GPL-3.0 license
# Parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# ASF-YOLO backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
]
# ASF-YOLO head
head:
[[-1, 1, Conv, [512, 1, 1]], #10
[4, 1, Conv, [512, 1, 1]], #11
[[-1, 6, -2], 1, Zoom_cat, [512]], # 12 cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]], #14
[2, 1, Conv, [256, 1, 1]], #15
[[-1, 4, -2], 1, Zoom_cat, [256]], #16 cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]], #18
[[-1, 14], 1, Concat, [1]], #19 cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]], #21
[[-1, 10], 1, Concat, [1]], #22 cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[4, 6, 8], 1, ScalSeq, [256]], #24 args[inchane]
[[17, -1], 1, attention_model, [256]], #25
[[-1, 20, 23], 1, Segment, [nc, anchors, 32, 256]], # Detect(P3, P4, P5)
]
- OD【检测】
# ASF-YOLO based on YOLOv5 by Ultralytics, GPL-3.0 license
# Parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# ASF-YOLO backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
]
# ASF-YOLO head
head:
[[-1, 1, Conv, [512, 1, 1]], #10
[4, 1, Conv, [512, 1, 1]], #11
[[-1, 6, -2], 1, Zoom_cat, [512]], # 12 cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]], #14
[2, 1, Conv, [256, 1, 1]], #15
[[-1, 4, -2], 1, Zoom_cat, [256]], #16 cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]], #18
[[-1, 14], 1, Concat, [1]], #19 cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]], #21
[[-1, 10], 1, Concat, [1]], #22 cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[4, 6, 8], 1, ScalSeq, [256]], #24 args[inchane]
[[17, -1], 1, attention_model, [256]], #25
[[-1, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
温馨提示:本文只是对yolov5l基础上添加模块,如果要对yolov8n/l/m/x进行添加则只需要指定对应的depth_multiple 和 width_multiple。
# YOLOv5n
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.25 # layer channel multiple
# YOLOv5s
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
# YOLOv5l
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# YOLOv5m
depth_multiple: 0.67 # model depth multiple
width_multiple: 0.75 # layer channel multiple
# YOLOv5x
depth_multiple: 1.33 # model depth multiple
width_multiple: 1.25 # layer channel multiple
2.3 注册模块
关键步骤三:在yolo.py的parse_model函数中注册
elif m is ScalSeq:
c2 = args[0]
elif m is Add:
c2 = args[0]
elif m is Zoom_cat:
c2 = 3*args[0]
elif m is attention_model:
c2 = args[0]
2.4 执行程序
在train.py中,将cfg的参数路径设置为yolov5_ASF.yaml的路径
建议大家写绝对路径,确保一定能找到
如果报错:
YOLO max_pool3d_with_indices_backward_cuda does not have a deterministic implementation
在train.py316行处添加
torch.use_deterministic_algorithms(False)
scaler.scale(loss).backward()
torch.use_deterministic_algorithms(True)
🚀运行程序,如果出现下面的内容则说明添加成功🚀
from n params module arguments
0 -1 1 7040 models.common.Conv [3, 64, 6, 2, 2]
1 -1 1 73984 models.common.Conv [64, 128, 3, 2]
2 -1 3 156928 models.common.C3 [128, 128, 3]
3 -1 1 295424 models.common.Conv [128, 256, 3, 2]
4 -1 6 1118208 models.common.C3 [256, 256, 6]
5 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
6 -1 9 6433792 models.common.C3 [512, 512, 9]
7 -1 1 4720640 models.common.Conv [512, 1024, 3, 2]
8 -1 3 9971712 models.common.C3 [1024, 1024, 3]
9 -1 1 2624512 models.common.SPPF [1024, 1024, 5]
10 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
11 4 1 132096 models.common.Conv [256, 512, 1, 1]
12 [-1, 6, -2] 1 0 models.common.Zoom_cat [512]
13 -1 3 3019776 models.common.C3 [1536, 512, 3, False]
14 -1 1 131584 models.common.Conv [512, 256, 1, 1]
15 2 1 33280 models.common.Conv [128, 256, 1, 1]
16 [-1, 4, -2] 1 0 models.common.Zoom_cat [256]
17 -1 3 756224 models.common.C3 [768, 256, 3, False]
18 -1 1 590336 models.common.Conv [256, 256, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 3 2495488 models.common.C3 [512, 512, 3, False]
21 -1 1 2360320 models.common.Conv [512, 512, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 3 9971712 models.common.C3 [1024, 1024, 3, False]
24 [4, 6, 8] 1 460544 models.common.ScalSeq [256]
25 [17, -1] 1 12325 models.common.attention_model [256]
26 [-1, 20, 23] 1 457725 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
YOLOv5-ASF summary: 396 layers, 47529634 parameters, 47529634 gradients, 117.9 GFLOPs
3. 完整代码分享
https://pan.baidu.com/s/1FTaIh8oaRM9BeXdcGb5bMQ?pwd=xk5j
提取码: xk5j
4. GFLOPs
关于GFLOPs的计算方式可以查看:百面算法工程师 | 卷积基础知识——Convolution
未改进的GFLOPs
改进后的GFLOPs
现在手上没有卡了,等过段时候有卡了把这补上,需要的同学自己测一下
5. 进阶
可以结合损失函数或者卷积模块进行多重改进
YOLOv5改进 | 损失函数 | EIoU、SIoU、WIoU、DIoU、FocuSIoU等多种损失函数——点击即可跳转
6. 总结
ASF-YOLO是一种基于YOLO(You Only Look Once)框架的改进模型,专门用于细胞实例分割。其主要原理包括利用CSPDarknet53骨干网络进行多尺度特征提取,然后通过规模序列特征融合模块(SSFF)和三重特征编码器模块(TFE)进行多尺度和细节特征的融合。该模型还引入了通道和位置注意力机制(CPAM),通过适应性地调整对相关通道和空间位置的关注,增强了小物体的检测和分割能力。在检测和分割阶段,ASF-YOLO使用增强的EIoU损失函数优化边界框的位置关系,并通过头部网络生成分割掩码。最后,在后处理阶段,ASF-YOLO采用软非极大值抑制(Soft-NMS)方法处理密集重叠的检测框,从而提高检测精度和分割性能。通过这些创新,ASF-YOLO在医学图像分析和细胞实例分割任务中表现出色。