【C#深度学习之路】如何使用C#实现Yolo8/11 Segment 全尺寸模型的训练和推理

项目背景
项目实现
- 推理过程
- 训练过程
项目展望
写在最后
项目下载链接

本文为原创文章，若需要转载，请注明出处。
原文地址：https://blog.csdn.net/qq_30270773/article/details/145169580
项目对应的Github地址：https://github.com/IntptrMax/YoloSharp
项目打包的Nuget地址：https://www.nuget.org/packages/IntptrMax.YoloSharp
C#深度学习之路专栏地址：https://blog.csdn.net/qq_30270773/category_12829217.html
关注我的Github，可以获取更多资料，请为你感兴趣的项目送上一颗小星星：https://github.com/IntptrMax

另外本人已经在多平台上发现了不做任何修改直接照抄就发布我的文章的盗版行为，还将我的开源免费资源当成付费资源发布的行为，对此表示强烈的不满。这种“盗窃知识”的行为严重损害了开源项目作者的个人利益以及开源共享精神。

项目背景

本人已经在Github及CSDN上连续发布了Yolov5，Yolov8，Yolov11模型的Predict方法的训练及推理的源码及实现方法介绍，这些项目成功实现了Yolo模型在C#平台上的训练，并且已经在Nuget中进行了打包。项目发布后一些小伙伴在问是否可以实现Segment方法，故进行了进一步的开发，本文就是对Segment的实现。
如果该资料对你有帮助，请在我的Github上送我一颗小星星。该项目的Github链接为https://github.com/IntptrMax/YoloSharp

项目实现

对于单独的Yolov5、Yolov8、Yolov11 Predict方法的实现原理及代码，请参考C#深度学习之路专栏内的相关文章。

Segment方法是在Predict方法上进行了进一步的开发。在Yolo模型上，仅为Segment层代替Predict层，而且Segment是继承自Predict的，加入了cv4这个模块，对Mask的计算。
详细代码如下：

推理过程

public class Segment : YolovDetect
{
	private readonly int nm;
	private readonly int npr;
	private readonly Proto proto;
	private readonly int c4;
	private readonly ModuleList<Sequential> cv4 = new ModuleList<Sequential>();

	public Segment(int[] ch, int nc = 80, int nm = 32, int npr = 256, bool legacy = false) : base(nc, ch, legacy)
	{
		this.nm = nm; // number of masks
		this.npr = npr;  // number of protos
		this.proto = new Proto(ch[0], this.npr, this.nm);  // protos
		c4 = Math.Max(ch[0] / 4, this.nm);

		foreach (int x in ch)
		{
			cv4.append(Sequential(new Conv(x, c4, 3), new Conv(c4, c4, 3), nn.Conv2d(c4, this.nm, 1)));
		}
		//RegisterComponents();

	}

	public override Tensor[] forward(Tensor[] x)
	{
		Tensor p = this.proto.forward(x[0]); // mask protos
		long bs = p.shape[0]; //batch size

		var mc = torch.cat(this.cv4.Select((module, i) => module.forward(x[i]).view(bs, this.nm, -1)).ToArray(), dim: 2); // mask coefficients				x = base.forward(x);
		x = base.forward(x);
		if (this.training)
		{
			x = (x.Append(mc).Append(p)).ToArray();
			return x;
		}
		else
		{
			return [torch.cat([x[0], mc], dim: 1), x[1], x[2], x[3], p];
		}
	}
}

经过Segment层的计算，得到了与Predict后相同的输出，并且额外得到了一组与Predict对应，且形状为[160,160]的Mask张量。这组张量里保存Mask的信息，按尺寸和比例进行缩放，即为原始图像中对应的分割区域。

训练过程

Segment下的训练与Predict方法下的训练相似，不过需要对Mask进行额外处理。
这里有一个精巧的构思，为了在一个Mask（形状为[160,160]）里容纳所有Label对应的掩码，在有Mask的区域使用label的index+1的值表示。其余0的区域表示没有Mask。处理图像本体、label、mask的具体代码如下：

public (Tensor, Tensor, Tensor) GetLetterBoxSegmentData(long index)
{
	using var _ = NewDisposeScope();
	int maskSize = 160;
	Tensor orgImageTensor = torchvision.io.read_image(imageFiles[(int)index], torchvision.io.ImageReadMode.RGB);

	int originalWidth = (int)orgImageTensor.shape[2];
	int originalHeight = (int)orgImageTensor.shape[1];

	float scale = Math.Min((float)imageSize / originalWidth, (float)imageSize / originalHeight);
	int padWidth = imageSize - (int)(scale * originalWidth);
	int padHeight = imageSize - (int)(scale * originalHeight);

	float maskWidthScale = scale * originalWidth / imageSize;
	float maskHeightScale = scale * originalHeight / imageSize;

	Tensor imgTensor = torchvision.transforms.functional.resize(orgImageTensor, (int)(originalHeight * scale), (int)(originalWidth * scale));
	imgTensor = torch.nn.functional.pad(imgTensor, [0, padWidth, 0, padHeight], PaddingModes.Zeros);

	Tensor outputImg = torch.zeros([3, imageSize, imageSize]);
	outputImg[TensorIndex.Colon, ..(int)imgTensor.shape[1], ..(int)imgTensor.shape[2]] = imgTensor;

	string labelName = GetLabelFileNameFromImageName(imageFiles[(int)index]);
	string[] lines = File.ReadAllLines(labelName);
	float[,] labelArray = new float[lines.Length, 5];

	Tensor mask = torch.zeros([maskSize, maskSize]);
	for (int i = 0; i < lines.Length; i++)
	{
		string[] datas = lines[i].Split(' ');
		labelArray[i, 0] = float.Parse(datas[0]);

		List<PointF> points = new List<PointF>();
		for (int j = 1; j < datas.Length; j = j + 2)
		{
			points.Add(new PointF(float.Parse(datas[j]) * scale * originalWidth * maskSize / imageSize, float.Parse(datas[j + 1]) * scale * originalHeight * maskSize / imageSize));
		}

		float maxX = points.Max(p => p.X) / maskSize;
		float maxY = points.Max(p => p.Y) / maskSize;
		float minX = points.Min(p => p.X) / maskSize;
		float minY = points.Min(p => p.Y) / maskSize;

		float width = maxX - minX;
		float height = maxY - minY;
		labelArray[i, 1] = minX + width / 2;
		labelArray[i, 2] = minY + height / 2;
		labelArray[i, 3] = width;
		labelArray[i, 4] = height;

		Bitmap bitmap = new Bitmap(maskSize, maskSize);
		Brush brush = new SolidBrush(Color.White);

		Graphics g = Graphics.FromImage(bitmap);
		g.FillClosedCurve(brush, points.ToArray());
		g.Save();
		Tensor msk = Lib.GetTensorFromBitmap(bitmap);
		msk = msk[0] > 0;
		mask[msk] = i + 1;
	}
	Tensor labelTensor = tensor(labelArray);
	long p = imgTensor.shape[0];
	return (imgTensor.MoveToOuterDisposeScope(), labelTensor.MoveToOuterDisposeScope(), mask.MoveToOuterDisposeScope());
}

另外Loss的计算也加入了Mask部分，此处不再仔细讲解。

项目效果如下
请添加图片描述

项目展望

目前已经实现了Yolov8、Yolov11 Segment方法的训练和推理，并且已经可以成功加载官方的预训练模型进行，或作为训练的基础权重。
接下来还有Pose和Obb方法，过段时间估计会有新的进展。

写在最后

使用C#深度学习项目是很多人所希望的。不过在该方向上资料很少，开发难度大。常规使用C#进行深度学习项目的方法为使用Python训练，转为Onnx模型再用C#调用。
目前我希望能够改变这一现象，希望能用纯C#平台进行训练和推理。这条路还很长，也很困难，希望有兴趣的读者能跟我一起让让C#的深度学习开发环境更为完善，以此能帮助到更多的人。

另外随着项目的关注度增多，已经开始有人盗版我的项目并将免费开源的项目当成付费项目在卖了。这种行为极其恶劣，请各位小伙伴积极抵制这种行为，还开源项目一片干净的环境，也让开源项目开发者有动力继续贡献更多的项目。

我在Github上已经将完整的代码发布了，项目地址为：https://github.com/IntptrMax/YoloSharp，期待你能在Github上送我一颗小星星。在我的Github里还GGMLSharp这个项目，这个项目也是C#平台下深度学习的开发包，希望能得到你的支持。

项目下载链接

https://download.csdn.net/download/qq_30270773/89969923