目标检测进阶：1.COCO数据集与VOC数据集

Person: person
Animal：bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe
Vehicle：bicycle, car, motorbike, aeroplane, bus, train, truck, boat
Facilities：traffic light, fire hydrant, stop sign, parking meter, bench
Necessities：backpack, umbrella, handbag, tie, suitcase
Sports：frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket
Kitchen：bottle, wine glass, cup, fork, knife,spoon, bowl
Food：banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake,
Furniture：chair, sofa, pottedplant, bed, diningtable, toilet, tvmonitor
Electronic：laptop, mouse, remote, keyboard, cell phone
Appliance：microwave, oven, toaster, sink, refrigerator
Indoor：book, clock, vase, scissors, teddy bear, hair drier, toothbrush

COCO数据集下载并解压后主要有2个文件夹。

images	包含了数据集的图片（训练集、验证集和测试集的图片）
annotations	包含了与图片对应的标注文件，主要是JSON格式的标注文件

以COCO2014为例，images中包含三个文件夹，train2014、val2014和test2014，这三个文件夹里分别代表训练集、验证集和测试集的图片，所有的图片基本上都是JPG格式。

annotation文件夹中有captions_train2014.json、captions_val2014.json、captions_test2014.json、instances_train2014.json、instances_val2014.json、instances_test2014.json、person_keypoints_train2014.json、person_keypoints_val2014.json、person_keypoints_test2014.json。这些都是标注文件，文件名后的train、val、test代表是训练集、验证集和测试集的标注文件。

Captions.json文件

Captions.json文件里的内容是对数据集的相关介绍信息，如数据集的版本号、年份、作者、数据集的许可证信息、图片的相关信息（每张图片的名称、尺寸、URL）等。

instances.json文件

instances.json文件是一个关键的标注文件，它包含了图像中实例（如物体、人等）的详细信息。

主要内容是annotations和categories。annotations包含标注信息，每个标注项代表一个目标实例的详细信息，bbox（边界框），segmentation（分割掩码）、area（实例所占据面积）、iscrowd（标注位）、category_id（类别ID)和id（标注ID）等。

bbox：一个包含四个值的数组[x,y,w,h]，表示标注对象的边界框，x和y是边界框的左上角坐标，w和h是边界框的宽度和高度。
segmentation：实例的分割掩码（可选）。这可以是一个多边形列表，表示实例的像素级轮廓；也可以是一个二进制掩码（在某些情况下），其中每个像素都标识为前景（实例）或背景。
iscrowd：指示实例是否是“杂乱”的，（例如，一群紧密排列的物体被视为单个实例）。在COCO数据集中，这通常用于人群等密集场景。

Person_keypoints.json文件

Person_keypoints.json文件是人体关键点检测任务的标注信息，其中主要内容是关键点标注信息，主要是annotations中的kerpoints（关键点）和bbox（边界框）。

keypoints：一个长度为3*k的数组，集中k是关键点总数（在COCO数据集中，对于人体关键点检测任务，k通常是17）。每个关键点由三个值组成：[x，y，v]，x和y是关键点的坐标值，v是可见性标志（v=0表示关键点未标注，v=1表示关键点标注了但不可见，v=2表示关键点标注了且可见）。
bbox：一个包含四个值的数组[x,y,w,h]，表示标注对象的边界框，x和y是边界框的左上角坐标，w和h是边界框的宽度和高度。

3.提取instances中的边界框信息

提取的数据量较大，需要在pycharm中设置txt文本的最大限制。

加入以下代码

idea.max.intellisense.filesize=20000
idea.max.content.load.filesize=20000

然后重启pytcharm

边界框信息提取代码如下

import json


def extract_annotation(coco_data, list_file):  # 提取xml中的信息
    d = {}
    for annotation in coco_data['annotations']:
        # 检查annotation中是否包含bbox字段（理论上应该总是包含）
        if 'bbox' in annotation:
            # bbox格式为[x, y, width, height]，其中(x, y)是左上角坐标
            bbox = annotation['bbox']
            # 将提取出的bbox转成[x1,y1,x2,y2],x1和y1是左上角坐标，x2和y2是右下角坐标
            bbox[2] = bbox[0] + bbox[2]
            bbox[3] = bbox[1] + bbox[3]
            # 将图像ID和类别ID与边界框一起存储，以便于后续处理
            category_id = annotation['category_id']
            bbox.append(category_id)
            image_id = annotation['image_id']
            if image_id not in d.keys():
                d[image_id] = []
            d[image_id].append(bbox)  # 汇聚属于同一图片的标注信息
    lines = list(d.keys())
    for line in lines:
        bboxes = d[line]
        filename = '{:012}.jpg'.format(line)
        for b in bboxes:
            filename = filename + " " + ",".join([str(int(a)) for a in b])
        list_file.write(filename + '\n')  # 写入信息


if __name__ == "__main__":
    # coco数据集版本，以COCO2014为例
    coco_year = 2014

    # 加载JSON文件
    with open('annotations/instances_train{}.json'.format(coco_year), 'r') as f:
        coco_data = json.load(f)
    with open('train.txt', 'w') as g:
        extract_annotation(coco_data, g)

    with open('annotations/instances_val{}.json'.format(coco_year), 'r') as f:
        coco_data = json.load(f)
    with open('val.txt', 'w') as g:
        extract_annotation(coco_data, g)

    # 加载JSON文件
    with open('annotations/instances_test{}.json'.format(coco_year), 'r') as f:
        coco_data = json.load(f)
    with open('test.txt', 'w') as g:
        extract_annotation(coco_data, g)

以上代码以COCO2014数据集为例进行提取，提取的信息整理后的格式如下所示。

二、VOC数据集

VOC数据集全称Visual Object Classes(视觉对象类别)数据集，是一个广泛应用于计算机视觉领域的数据集，特别是在目标检测、图像分割和图像分类等任务中。VOC数据集最初由英国牛津大学的计算机视觉小组创建，并在PASCAL VOC挑战赛中使用，该数据包含大量的带有标注信息的图像，用于训练和评估图像识别算法。VOC数据集涵盖了多个年度的发布，每个年度都包含了训练集、验证集和测试集。

1.VOC数据集下载

VOC2007训练集验证集http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar VOC2007测试集http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar VOC2012训练集验证集http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

VOC2007训练集验证集百度云资源（提取码6zg6）https://pan.baidu.com/s/1xKJwIcfAQ1nsi7CZUqqCiw VOC2007测试集百度云资源（提取码zpms）https://pan.baidu.com/s/1zDQqKRgaXjqfvGPMkSkjQg

2.VOC数据集相关介绍

VOC数据集总共有20个类,在VOC2007中训练集、验证集和测试集总共有9963张图像，包含24640个目标。

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

VOC2007下载并解压后有五个文件夹。

Annotations	进行目标检测任务是的标签文件，xml格式
ImageSets	存放数据集分割文件，文本格式
JPEGImages	存放所有图片，jpg格式
SegmentationClass	存放语义分割图片，png格式
SementationObject	存放实例分割图片，png格式

Annotation文件夹

Annotation中每一个xml文件对应一张图片的标签信息，如下所示

<annotation>
	<folder>VOC2007</folder>
	<filename>000001.jpg</filename>                 #对应图片名称
	<source>
		<database>The VOC2007 Database</database>   #数据集
		<annotation>PASCAL VOC2007</annotation> 
		<image>flickr</image>
		<flickrid>341012865</flickrid>
	</source>
	<owner>
		<flickrid>Fried Camels</flickrid>
		<name>Jinky the Fruit Bat</name>
	</owner>
	<size>                                          #图片尺寸
		<width>353</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>dog</name>                            #第一个目标信息，类别，坐标位置等
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>48</xmin>
			<ymin>240</ymin>
			<xmax>195</xmax>
			<ymax>371</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>                         #第二个目标信息，类别，坐标位置等
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>8</xmin>
			<ymin>12</ymin>
			<xmax>352</xmax>
			<ymax>498</ymax>
		</bndbox>
	</object>
</annotation>

ImageSets文件夹

ImageSets文件夹中存放三种数据集分割文件夹（Layout，Main，Segmentation），其中Main文件夹存放的是用于分类和检测的数据集分割文件，Layout文件夹用于 person layout任务，Segmentation用于分割任务。如下图所示里面的各个文本文件中是各样本名，代表某张图片和对应的标签文件作为train或val或test。

除了数据集分割文件，在Main文件夹中还有各个类别在train或val或test中的ground truth，这个ground truth是为了方便分类任务而提供的。如下图所示，第一列代表那一个样本，即样本名，第二列只有1和-1，代表该图片是否属于该类，如areoplane_train.txt文本中的000012样本就不属于areoplane类,所以为-1，即0000012.jpg中并没有飞机目标。而000033样本就属于areoplane类，代表0000033.jpg中有飞机目标。

JPEGImages文件夹

JPEGImages存放所有图片（训练集、验证集、测试集），所有图片均为JPG格式。如下图所示

SegmentationClass文件夹

SegmentationClass文件夹是存放语义分割任务中用到的标签图片。在语义分割任务中，不同类别像素点会被标注为不同的颜色，从而实现对每个像素点的分类。

SegmentationObject文件夹

SegmentationObject文件夹是存放实例分割任务中用到的标签图片。在实例分割任务中，不仅要区分不同类别的对象，还需要区分同一类别的不同对象。

3.提取xml文件中的边界框信息

XML中信息读取代码如下所示

import xml.etree.ElementTree as ET
import numpy as np

# ----------------------------------------------VOC数据集中所有类-----------------------------------------------------------#
classes = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable',
           'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

nums = np.zeros(len(classes))

def extract_annotation(year, image_id, list_file):  # 提取xml中的信息
    in_file = open('VOC%s/Annotations/%s.xml' % (year, image_id), encoding='utf-8')
    tree = ET.parse(in_file)
    root = tree.getroot()
    filename = root.find('filename').text  # 图片名
    for obj in root.iter('object'):
        cls = obj.find('name').text  # 类别
        difficult = 0
        if obj.find('difficult') != None:
            difficult = obj.find('difficult').text
        if cls not in classes or int(difficult) == 1:  # 不属于classes中的类或者检测难度较大的则舍弃
            continue
        cls_id = classes.index(cls)  # 类别转成数值id
        xmlbox = obj.find('bndbox')  # 坐标信息
        b = (int(float(xmlbox.find('xmin').text)), int(float(xmlbox.find('ymin').text)),
             int(float(xmlbox.find('xmax').text)), int(float(xmlbox.find('ymax').text)))
        filename = filename + " " + ",".join([str(a) for a in b]) + ',' + str(cls_id)
    list_file.write(filename + '\n')  # 写入信息


if __name__ == "__main__":
    voc_year = 2007  # 下载的VOC数据集版本 以VOC2007为例
    with open('VOC%s/ImageSets/Main/train.txt' % (voc_year)) as f:  # 获得训练集的所有样本名
        train_names = f.readlines()
        train_names = [line.rstrip('\n') for line in train_names]

    with open('VOC%s/train.txt' % (voc_year), 'w') as g:  # 提取信息
        for name in train_names:
            extract_annotation(voc_year, name, g)

    with open('VOC%s/ImageSets/Main/val.txt' % (voc_year)) as f:  # 获得验证集的所有样本名
        val_names = f.readlines()
        val_names = [line.rstrip('\n') for line in val_names]

    with open('VOC%s/val.txt' % (voc_year), 'w') as g:
        for name in val_names:
            extract_annotation(voc_year, name, g)

    with open('VOC%s/ImageSets/Main/test.txt' % (voc_year)) as f:  # 获得测试集的所有样本名
        test_names = f.readlines()
        test_names = [line.rstrip('\n') for line in test_names]

    with open('VOC%s/test.txt' % (voc_year), 'w') as g:
        for name in test_names:
            extract_annotation(voc_year, name, g)

以上代码以VOC2007的数据集为例进行提取，提取的信息整理后的格式如下所示。