本篇文章将介绍一个新的改进机制——SEAM,并阐述如何将其应用于YOLOv11中,显著提升模型性能。首先,我们将解析SEAM他做了什么,SEAM(Self-Ensembling Attention Mechanism)是一种自集成注意力机制,通过多视角特征融合和一致性正则化来增强模型的鲁棒性和泛化能力,特别适用于处理遮挡问题和多尺度特征融合。随后,我们会详细说明如何将该模块与YOLOv11相结合,展示代码实现细节及其使用方法,最终展现这一改进对目标检测效果的积极影响。
1. SEAM结构介绍
左边是SEAM的架构,右边是部分为通道和空间混合模块CSMM的结构。CSMM利用深度可分卷积来学习不同尺度的特征空间尺度与通道的相关性。
SEAM是一种自集成注意力机制,旨在通过多视角特征融合和一致性正则化来增强模型的鲁棒性和泛化能力。
-
首先对输入特征Patch Embedding:
- 输入图像被分割成不同大小的patch(6, 7, 8),这些patch通过Patch Embedding层进行初步处理,生成特征表示。
-
CSMM模块:
-
深度可分离卷积:使用深度可分离卷积来学习空间维度和通道之间的相关性。这个操作分为两个步骤:深度卷积:对每个通道独立进行卷积操作。逐点卷积:对所有通道进行1x1卷积,整合信息。
-
-
激活和归一化:
- GELU激活函数:在卷积操作后,使用GELU激活函数引入非线性。
- 批归一化:对特征图进行批归一化,稳定训练过程。
-
输出特征表示:
最终,将CSMM模块输出特征进行平均池化,融合多尺度特征的特征表示,增强了模型对不同尺度信息的捕捉能力。
通过这些步骤,SEAM和CSMM能够有效地整合多尺度特征,提高模型在图像识别等任务中的性能。
2. YOLOv11与SEAM的结合
本文将YOLOv11模型的C2PSA模块中的注意力层替换LSKA,组合成C2PSA_LSKA模块,利用LSKA的分离卷积核特性,增强C2PSA模块的特征提取能力,同时保持计算复杂度较低。
3. SEAM代码部分
import torch
import torch.nn as nn
from .block import PSABlock,C2PSA
class Residual(nn.Module):
def __init__(self, fn):
super(Residual, self).__init__()
self.fn = fn
def forward(self, x):
return self.fn(x) + x
class SEAM(nn.Module):
def __init__(self, c1, c2, n=1, reduction=16):
super(SEAM, self).__init__()
if c1 != c2:
c2 = c1
self.DCovN = nn.Sequential(
# nn.Conv2d(c1, c2, kernel_size=3, stride=1, padding=1, groups=c1),
# nn.GELU(),
# nn.BatchNorm2d(c2),
*[nn.Sequential(
Residual(nn.Sequential(
nn.Conv2d(in_channels=c2, out_channels=c2, kernel_size=3, stride=1, padding=1, groups=c2),
nn.GELU(),
nn.BatchNorm2d(c2)
)),
nn.Conv2d(in_channels=c2, out_channels=c2, kernel_size=1, stride=1, padding=0, groups=1),
nn.GELU(),
nn.BatchNorm2d(c2)
) for i in range(n)]
)
self.avg_pool = torch.nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(c2, c2 // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(c2 // reduction, c2, bias=False),
nn.Sigmoid()
)
self._initialize_weights()
# self.initialize_layer(self.avg_pool)
self.initialize_layer(self.fc)
def forward(self, x):
b, c, _, _ = x.size()
y = self.DCovN(x)
y = self.avg_pool(y).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
y = torch.exp(y)
return x * y.expand_as(x)
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight, gain=1)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def initialize_layer(self, layer):
if isinstance(layer, (nn.Conv2d, nn.Linear)):
torch.nn.init.normal_(layer.weight, mean=0., std=0.001)
if layer.bias is not None:
torch.nn.init.constant_(layer.bias, 0)
class PSABlock_SEAM(PSABlock):
def __init__(self, c, qk_dim =16 , pdim=32, shortcut=True) -> None:
"""Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction."""
super().__init__(c)
self.ffn = SEAM(c,c)
class C2PSA_SEAM(C2PSA):
def __init__(self, c1, c2, n=1, e=0.5):
"""Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio."""
super().__init__(c1, c2)
assert c1 == c2
self.c = int(c1 * e)
self.m = nn.Sequential(*(PSABlock_SEAM(self.c, qk_dim =16 , pdim=32) for _ in range(n)))
if __name__ =='__main__':
ASSA_Attention = SEAM(256,256)
#创建一个输入张量
batch_size = 1
input_tensor=torch.randn(batch_size, 256, 64, 64 )
#运行模型并打印输入和输出的形状
output_tensor =ASSA_Attention(input_tensor)
print("Input shape:",input_tensor.shape)
print("0utput shape:",output_tensor.shape)
4. 将SEAM引入到YOLOv11中
第一: 将下面的核心代码复制到D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\nn路径下,如下图所示。
第二:在task.py中导入C2PSA_SEAM包
第三:在task.py中的模型配置部分下面代码
第四:将模型配置文件复制到YOLOV11.YAMY文件中
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA_SEAM, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
第五:运行成功
from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld
if __name__=="__main__":
# 使用自己的YOLOv11.yamy文件搭建模型并加载预训练权重训练模型
model = YOLO(r"D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\cfg\models\11\yolo11_SEAM.yaml")\
.load(r'D:\bilibili\model\YOLO11\ultralytics-main\yolo11n.pt') # build from YAML and transfer weights
results = model.train(data=r'D:\bilibili\model\ultralytics-main\ultralytics\cfg\datasets\VOC_my.yaml',
epochs=100, imgsz=640, batch=8)