grounding_dino_mmcv目标检测算法模型

grounding_dino

论文

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
论文链接

模型结构

论文提出了一种名为Grounding DINO的开放集目标检测器。它将基于Transformer的DINO检测器与预训练相结合，能够通过人类输入如类别名称或指代表达式来检测任意对象.

feature enhancer：text feature使用 self-attention, image features使用deformabel self-attention来减少计算量。
query初始化：feature enhancer输出的text features与image features,计算相似度，并求最大值后排序。
Cross-Modality Decoder：用第2步提取query做为输入，依次与image features，text features进行跨模态注意力计算，最终，获得更新后的decode输出。

算法原理

闭集检测器通常具有三个重要模块：特征提取的主干、用于特征增强的颈部和用于区域细化（或框预测）的头部
闭集检测器可以通过学习语言感知的区域嵌入来泛化以检测新对象，从而使每个区域可以在语言感知的语义空间中被分类为新类别。实现这一目标的关键是在颈部和/或头部输出处使用区域输出和语言特征之间的对比损失。

环境配置

Docker（方法一）

docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

cd /your_code_path/grounding_dino_mmcv
pip install mmdet -i https://mirrors.aliyun.com/pypi/simple/
pip install -r requirements/multimodal.txt -i https://mirrors.aliyun.com/pypi/simple/
export HF_ENDPOINT=https://hf-mirror.com

Dockerfile（方法二）

cd ./docker
docker build --no-cache -t mmdet:last .
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
cd /your_code_path/grounding_dino_mmcv
pip install mmdet -i https://mirrors.aliyun.com/pypi/simple/
pip install -r requirements/multimodal.txt -i https://mirrors.aliyun.com/pypi/simple/
export HF_ENDPOINT=https://hf-mirror.com

Anaconda（方法三）

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装： https://developer.hpccube.com/tool/

DTK软件栈: dtk24.04
python: python3.10
torch: 2.1
mmcv: 2.0.1

Tips：以上dtk软件栈、python、torch、mmcv等DCU相关工具版本需要严格一一对应

2、其他非特殊库直接按照requirements.txt安装

cd /your_code_path/grounding_dino_mmcv
pip install mmdet -i https://mirrors.aliyun.com/pypi/simple/
pip install -r requirements/multimodal.txt -i https://mirrors.aliyun.com/pypi/simple/
export HF_ENDPOINT=https://hf-mirror.com

数据集

COCO2017（在网络良好的情况下，如果没有下载数据集，程序会默认在线下载数据集）

训练数据快速下载中心:SCNet AIDatasets ,项目中的训练数据下载地址COCO2017

数据集的目录结构如下：

├── images 
│   ├── train2017
│   ├── val2017
│   ├── test2017
├── labels
│   ├── train2017
│   ├── val2017
├── annotations
│   ├── instances_val2017.json
├── LICENSE
├── README.txt 
├── test-dev2017.txt
├── train2017.txt
├── val2017.txt

我们通过了mini数据集，供验证训练使用，如需正式使用，请下载完整COCO数据集或使用定制化数据集。

cd /your_code_path/grounding_dino_mmcv
cd datasets/
unzip  coco_mini.zip

训练

数据集放置位置默认为当前目录下 datasets/
如需要变更数据集目录请修改 coco_detection.py下的 data_root

单机四卡

cd /your_code_path/grounding_dino_mmcv
bash ./train_multi.sh

单机单卡

cd /your_code_path/grounding_dino_mmcv
bash ./train.sh

推理

可使用官方模型权重进行推理，也可使用自己训练模型权重进行推理
这里以官方模型推理举例[下载地址：groundingdino_swint_ogc_mmdet-822d7e9d.pth]
官方推理需要下载nltk的nltk_data中的 tokenizers/punkt和taggers/averaged_perceptron_tagger
可在http://www.nltk.org/nltk_data/ 下载并放于/root/nltk_data下
nltk数据如下所示

├── nltk_data 
│   ├── taggers
│       ├── averaged_perceptron_tagger
│          ├──averaged_perceptron_tagger.pickle
│   ├── tokenizers
│       ├──punkt
│          ├──czech.pickle
│          ├──anish.pickle
│          ├──dutch.pickle
│          ├── ......

# 官方推理代码
python demo/image_demo.py \
	demo/demo.jpg \
	configs/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py \
	--weights groundingdino_swint_ogc_mmdet-822d7e9d.pth \
	--texts 'bench . car .'

result

精度

模型名称	Backbone	Style	amp混精	Box [email protected]
Mask R-CNN	R50	Scratch	on	48.3

应用场景

算法类别

目标检测

热点应用行业

金融,交通,教育

源码仓库及问题反馈

ModelZoo / grounding_dino_mmcv · GitLab

参考资料

https://github.com/open-mmlab/mmdetection/tree/main