yolo_correct_boxes()解析

本文将对keras版本yolo中的yolo_correct_boxes()函数做简单说明

该函数源码、参数说明如下

def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):
    """对模型输出的box信息(x, y, w, h)进行校正,输出基于原图坐标系的box信息(x_min, y_min, x_max, y_max)

    Args:
        box_xy: 模型输出的box中心坐标信息,范围0~1
        box_wh: 模型输出的box长宽信息,范围0~1
        input_shape: 模型的图像尺寸, 长宽均是32倍数
        image_shape: 原图尺寸

    Returns:
        boxes: 基于原图坐标系的box信息(实际坐标值,非比值)
    """
    # print("box_xy=", box_xy)
    box_yx = box_xy[..., ::-1]
    # print("box_yx=", box_yx)
    box_hw = box_wh[..., ::-1]
    input_shape = K.cast(input_shape, K.dtype(box_yx))
    image_shape = K.cast(image_shape, K.dtype(box_yx))
    new_shape = K.round(image_shape * K.min(input_shape/image_shape))
    offset = (input_shape-new_shape)/2./input_shape
    scale = input_shape/new_shape
    box_yx = (box_yx - offset) * scale
    box_hw *= scale

    box_mins = box_yx - (box_hw / 2.)
    box_maxes = box_yx + (box_hw / 2.)
    boxes =  K.concatenate([
        box_mins[..., 0:1],  # y_min
        box_mins[..., 1:2],  # x_min
        box_maxes[..., 0:1],  # y_max
        box_maxes[..., 1:2]  # x_max
    ])

    # Scale boxes back to original image shape.
    boxes *= K.concatenate([image_shape, image_shape])
    return boxes

我们先来看看推理过程input_shape的生成过程,首先是推理入口函数

def detect_image(self, image):
        ''' Detect the objections of the input image.
        
        Args:
            image: A opencv-python mat objection.
            
        Returns:
            A PIL Image objection with the drawed rectangles.
        '''
        start = timer()

        if self.model_image_size != (None, None):
            assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
            assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
            boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))
        else:
            new_image_size = (image.width - (image.width % 32),
                              image.height - (image.height % 32))
            boxed_image = letterbox_image(image, new_image_size)
        image_data = np.array(boxed_image, dtype='float32')

        # print(image_data.shape)
        image_data /= 255.
        image_data = np.expand_dims(image_data, 0)  # Add batch dimension.

                        
        out_boxes, out_scores, out_classes = self.sess.run(
            [self.boxes, self.scores, self.classes],
            feed_dict={
                self.yolo_model.input: image_data,
                self.input_image_shape: [image.size[1], image.size[0]],
                K.learning_phase(): 0
            })

最终输入模型的图像数据是image_data, input_shape也就是它的形状,这里有两种选择,一种是指定尺寸,如果不指定就就近选择和长宽值最接近的32的倍数值作为输入尺寸, 确定了输入尺寸之后,如何将原图缩放到这个尺寸呢?这就要看letterbox_image()函数:

def letterbox_image(image, size):
    '''resize image with unchanged aspect ratio using padding'''
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (128,128,128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image

image就是原图

模型输入图像我们称为input_image, size就是它的shape, 长宽均为32的倍数

原图长宽等比缩放后取整,得到的图像我们称为resize_image

input_image初始化为全图像素值为(128,128,128)的纯色图,然后将resize_image嵌在正中,这样就得到了输入图像

我画了个示意图方便理解

理解了这个过程之后我们再回头看yolo_correct_boxes()的逻辑

模型输出的box_xy和box_wh是相对于input_image的比值,映射回image需要经过如下步骤

step1. 将box_xy相对于input_image左上角的距离修改为相对于resize_image左上角的位置

step2. 将box_xy相对于input_image的比例修改为相对于resize_image的比例

offset = (input_shape-new_shape)/2./input_shape
scale = input_shape/new_shape
box_yx = (box_yx - offset) * scale

step3. 将box_wh相对于input_image的比例修改为相对于resize_image的比例

box_hw *= scale

step4. 将比例值转化为坐标值

boxes *= K.concatenate([image_shape, image_shape])

tips:

没有深入理解之前会以为将原图粗暴的缩放成input_image,这样得到的box_xy直接乘image不就行了,为什么还要做yolo_correct_boxes(),这样做在流程上确实没什么问题,但是图像会形变,相当于特征发生了变化,模型看见的就不再是物体本来的样子了

yolo_correct_boxes()解析

悦读