Bootstrap

用一个网络实现曝光不足和曝光过度的曝光修正:Learning Multi-Scale Photo Exposure Correction

Learning Multi-Scale Photo Exposure Correction

 

[pdf] [Github]

目录

Abstract

1. Introduction

2. Related Work

3. Our Dataset

4. Our Method

4.1. Coarse-to-Fine Exposure Correction

4.2. Coarse-to-Fine Network


Abstract

Capturing photographs with wrong exposures remains a major source of errors in camera-based imaging. Exposure problems are categorized as either: (i) overexposed, where the camera exposure was too long, resulting in bright and washed-out image regions, or (ii) underexposed, where the exposure was too short, resulting in dark regions. Both under- and overexposure greatly reduce the contrast and visual appeal of an image. Prior work mainly focuses on underexposed images or general image enhancement.

In contrast, our proposed method targets both over- and underexposure errors in photographs. We formulate the exposure correction problem as two main sub-problems: (i) color enhancement and (ii) detail enhancement. Accordingly, we propose a coarse-to-fine deep neural network (DNN) model, trainable in an end-to-end manner, that addresses each subproblem separately. A key aspect of our solution is a new dataset of over 24,000 images exhibiting the broadest range of exposure values to date with a corresponding properly exposed image.

Our method achieves results on par with existing state-of-the-art methods on underexposed images and yields significant improvements for images suffering from overexposure errors.

研究背景:

在基于相机的成像中,用错误的曝光捕捉照片仍然是错误的主要来源。曝光问题分为两类:(i) 过度曝光,即相机曝光时间过长,导致图像区域明亮和褪色;(ii) 曝光不足,即曝光时间过短,导致图像区域暗。曝光不足和过度都会大大降低图像的对比度和视觉吸引力。以前的工作主要集中在曝光不足的图像或一般的图像增强。

研究方法:

与此相反(In contrast 衔接,突出本文的独特性和创新性),本文所提出的方法在照片中同时针对过曝光和欠曝光误差。本文将曝光校正问题描述为两个主要的子问题:(i) 色彩增强和 (ii) 细节增强。因此,本文提出了一个由粗到细的深度神经网络(DNN) 模型,以端到端方式可训练,分别解决每个子问题。本文解决方案的一个关键方面是一个包含 24,000 多张图像的新数据集,它展示了迄今为止最广泛的曝光值范围和相应的适当曝光图像。

研究结果:

本文的方法在曝光不足的图像上取得了与现有最先进的方法相同的结果,并对遭受曝光过度错误的图像产生了显著的改善。

1. Introduction

The exposure used at capture time directly affects the overall brightness of the final rendered photograph. Digital cameras control exposure using three main factors: (i) capture shutter speed, (ii) f-number, which is the ratio of the focal length to the camera aperture diameter, and (iii) the ISO value to control the amplification factor of the received pixel signals. In photography, exposure settings are represented by exposure values (EVs), where each EV refers to different combinations of camera shutter speeds and f-numbers that result in the same exposure effect—also referred to as ‘equivalent exposures’ in photography.

Digital cameras can adjust the exposure value of captured images for the purpose of varying the brightness levels. This adjustment can be controlled manually by users or performed automatically in an auto-exposure (AE) mode. When AE is used, cameras adjust the EV to compensate for low/high levels of brightness in the captured scene using through-the-lens (TTL) metering that measures the amount of light received from the scene [49].

曝光技术介绍:

在捕捉时间使用的曝光直接影响最终渲染照片的整体亮度。数码相机利用三个主要因素控制曝光:(i) 捕捉快门速度,(ii) f-number,即焦距与相机孔径的比值,(iii) ISO值来控制接收到的像素信号的放大系数。在摄影中,曝光设置由曝光值 (EV) 表示,每个 EV 指的是相机快门速度和 f-numbers 的不同组合,从而产生相同的曝光效果,在摄影中也被称为 “等效曝光

数码相机可以调整捕捉到的图像的曝光值,以改变亮度级别。这种调整可以由用户手动控制,也可以在自动曝光 (AE) 模式下自动执行。当使用 AE 时,相机通过测量从场景接收到的光量的通过镜头 (TTL) 测光来调整 EV 来补偿捕获场景中的低/高亮度。相关的专业技术是本文必须的,如果只停留在图像层面,很难有信服力。

[49] Bryan Peterson. Understanding exposure: How to shoot great photographs with any camera. AmPhoto Books, 2016.

Exposure errors can occur due to several factors, such as errors in measurements of TTL metering, hard lighting conditions (e.g., very low lighting and backlighting), dramatic changes in the brightness level of the scene, and errors made by users in the manual mode. Such exposure errors are introduced early in the capture process and are thus hard to correct after rendering the final 8-bit image. This is due to the highly nonlinear operations applied by the camera image signal processor (ISP) afterwards to render the final 8-bit standard RGB (sRGB) image [31].

Fig. 1 shows typical examples of images with exposureerrors. In Fig. 1, exposure errors result in either very bright image regions, due to overexposure, or very dark regions, caused by underexposure errors, in the final rendered images. Correcting images with such errors is a challenging task even for well-established image enhancement software packages, see Fig. 9. Although both over- and underexposure errors are common in photography, most prior work is mainly focused on correcting underexposure errors [23, 56, 58, 65, 66] or generic image quality enhancement [11, 18].

曝光错误导致的因素及对成像的影响:

曝光错误可能由以下几个因素引起,如 TTL 测光的测量错误、硬光照条件 (例如,非常低的照明和背光)、场景亮度的剧烈变化以及用户在手动模式下产生的错误。这样的曝光错误是在捕获过程的早期引入的,因此在渲染最后的 8 位图像后很难纠正。这是由于相机图像信号处理器 (ISP) 之后对最终的 8 位标准 RGB (sRGB) 图像进行了高度非线性的处理。

图 1 为曝光误差图像的典型例子。在图 1 中,在最终渲染的图像中,曝光错误导致的要么是由于过度曝光导致的非常亮的图像区域,要么是由于曝光不足导致的非常暗的图像区域。即使对于已经成熟的图像增强软件包来说,校正带有此类错误的图像也是一项具有挑战性的任务,如图 9 所示。尽管曝光过低和曝光过低误差在摄影中都很常见,但之前的大部分工作主要集中在纠正曝光不足误差或一般的图像质量增强。

Figure 1: Photographs with over- and underexposure errors and the results of our method using a single model for exposure correction. These sample input images are taken from outside our dataset to demonstrate the generalization of our trained model. 

Figure 9: Comparisons with commercial software packages. The input images are taken from Flickr.

Contributions

We propose a coarse-to-fine deep learning method for exposure error correction of both over- and underexposed sRGB images. Our approach formulates the exposure correction problem as two main sub-problems: (i) color and (ii) detail enhancement. We propose a coarse-tofine deep neural network (DNN) model, trainable in an endto-end manner, that begins by correcting the global color information and subsequently refines the image details.

In addition to our DNN model, a key contribution to the exposure correction problem is a new dataset containing over 24,000 images1 rendered from raw-RGB to sRGB with different exposure settings with broader exposure ranges than previous datasets. Each image in our dataset is provided with a corresponding properly exposed reference image.

Lastly, we present an extensive set of evaluations and ablations of our proposed method with comparisons to the state of the art. We demonstrate that our method achieves results on par with previous methods dedicated to underexposed images and yields significant improvements on overexposed images. Furthermore, our model generalizes well to images outside our dataset.

贡献:
(方法方面)本文提出了一种由粗到细的深度学习方法,用于曝光过度和曝光不足的 sRGB 图像的曝光误差校正。该方法将曝光校正问题定义为两个主要的子问题 :(i) 颜色和 (ii) 细节增强。本文提出了一个由粗到细的深度神经网络 (DNN) 模型,以端到端方式进行训练,该模型首先纠正全局颜色信息,然后细化图像细节。

(数据方面)本文曝光校正问题的一个关键贡献是一个新的数据集,包含超过 24,000 张图像,从 raw-RGB 到sRGB,具有不同的曝光设置,曝光范围比以前的数据集更大该数据集中的每幅图像都提供了相应的适当曝光的参考图像

(效果方面)最后,本文提出了一套广泛的评估和消融实验,并与目前的技术进行了比较。实验证明,本文的方法可以达到与以前的方法相同的结果,专门用于曝光不足的图像,并在曝光过度的图像上产生显著的改进。此外,本文的模型可以很好地推广到数据集之外的图像(泛化性能好)。

2. Related Work

感兴趣的读者,请阅读博客 [ 曝光校准相关工作:Related Work of Exposure Correction ]

3. Our Dataset

To train our model, we need a large number of training images rendered with realistic over- and underexposure errors and corresponding properly exposed ground truth images. As discussed in Sec. 2, such datasets are currently not publicly available to support exposure correction research. For this reason, our first task is to create a new dataset. Our dataset is rendered from the MIT-Adobe FiveK dataset [6], which has 5,000 raw-RGB images and corresponding sRGB images rendered manually by five expert photographers [6].

For each raw-RGB image, we use the Adobe Camera Raw SDK [1] to emulate different EVs as would be applied by a camera [53]. Adobe Camera Raw accurately emulates the nonlinear camera rendering procedures using metadata embedded in each DNG raw file [2, 53]. We render each raw-RGB image with different digital EVs to mimic real exposure errors. Specifically, we use the relative EVs −1.5, −1, +0, +1, and +1.5 to render images with underexposure errors, a zero gain of the original EV, and overexposure errors, respectively. The zero-gain relative EV is equivalent to the original exposure settings applied onboard the camera during capture time.

As the ground truth images, we use images that were manually retouched by an expert photographer (referred to as Expert C in [6]) as our target correctly exposed images, rather than using our rendered images with +0 relative EV. The reason behind this choice is that a significant number of images contain backlighting or partial exposure errors in the original exposure capture settings. The expert adjusted images were performed in ProPhoto RGB color space [6] (rather than raw-RGB), which we converted to a standard 8-bit sRGB color space encoding.

为了训练模型,需要大量具有真实曝光过低误差的训练图像和相应的正确曝光的 GT 图像。然而,这些数据集目前还不能公开用于支持曝光校正研究。由于这个原因,本文的第一个任务是创建一个新的数据集。本文的数据集是由 MIT-Adobe FiveK 数据集渲染的,该数据集有 5000 张 raw-RGB 图像和相应的 sRGB 图像,由 5 位专业摄影师手工渲染

对于每个 raw-RGB 图像,使用 Adobe Camera Raw SDK [1] 来模拟不同的 EV,就像相机所应用的那样。Adobe Camera Raw 使用嵌入在每个 DNG Raw 文件中的元数据精确地模拟非线性摄像机渲染过程。本文用不同的数字电动汽车渲染每个原始 RGB 图像来模拟真实的曝光误差。具体来说,使用相对 EV−1.5、−1、+0、+1 和 +1.5 分别渲染曝光不足、原始 EV 增益为零和曝光过差的图像。零增益相对 EV 相当于相机在捕捉时间内应用的原始曝光设置。

对于 GT 图像,本文使用由专业摄影师 (在 [6] 中称为 expert C) 手工修饰的图像作为目标正确曝光的图像,而不是使用相对 EV 为 +0 的渲染图像。这一选择背后的原因是,在原始曝光捕捉设置中,大量图像包含背光或部分曝光错误。经过专家调整的图像在 ProPhoto RGB 颜色空间(而不是 raw-RGB)中执行,本文将其转换为标准的 8 位 sRGB 颜色空间编码。

In total, our dataset contains 24,330 8-bit sRGB images with different digital exposure settings. We discarded a small number of images that had misalignment with their corresponding ground truth image. These misalignments are due to different usage of the DNG crop area metadata by Adobe Camera Raw SDK and the expert. Our dataset is divided into three sets: (i) training set of 17,675 images, (ii) validation set of 750 images, and (iii) testing set of 5,905 images. The training, validation, and testing sets do not share any scenes in common. Fig. 2 shows examples of our generated 8-bit sRGB images and the corresponding properly exposed 8-bit sRGB reference images.

本文的数据集包含 24,330 张 8 位 sRGB 图像,具有不同的数字曝光设置。本文丢弃了少量与它们对应的 GT 图像有偏差的图像。这些偏差是由于 Adobe Camera Raw SDK 和专家对 DNG 作物面积元数据的不同使用。

本文的数据集分为3个集: (i) 17,675幅图像的训练集,(ii) 750 幅图像的验证集,(iii) 5,905 幅图像的测试集训练、验证和测试集不共享任何共同的场景。图 2 展示了我们生成的 8 位 sRGB 图像和相应的 8 位 sRGB 参考图像的示例。

Figure 2: Dataset overview. Our dataset contains images with different exposure error types and their corresponding properly exposed reference images. Shown is a t-SNE visualization [42] of all images in our dataset and the lowlight (LOL) paired dataset (outlined in red) [58]. Notice that LOL covers a relatively small fraction of the possible exposure levels, as compared to our introduced dataset. Our dataset was rendered from linear raw-RGB images taken from the MIT-Adobe FiveK dataset [6]. Each image was rendered with different relative exposure values (EVs) by an accurate emulation of the camera ISP processes. 

4. Our Method

4.1. Coarse-to-Fine Exposure Correction

Let X represent the Laplacian pyramid of I with n levels, such that X(l) is the l th level of X. The last level of this pyramid (i.e., X(n) ) captures low-frequency information of I, while the first level (i.e., X(1)) captures the highfrequency information. Such frequency levels can be categorized into: (i) global color information of I stored in the low-frequency level and (ii) image coarse-to-fine details stored in the mid- and high-frequency levels. These levels can be later used to reconstruct the full-color image I.

Fig. 3 motivates our coarse-to-fine approach to exposure correction. Figs. 3-(A) and (B) show an example overexposed image and its corresponding well-exposed target, respectively. As observed, a significant exposure correction can be obtained by using only the low-frequency layer (i.e., the global color information) of the target image in the Laplacian pyramid reconstruction process, as shown in Fig. 3-(C). We can then improve the final image by enhancing the details in a sequential way by correcting each level of the Laplacian pyramid, as shown in Fig. 3-(D). Practically, we do not have access to the properly exposed image in Fig. 3-(B) at the inference stage, and thus our goal is to predict the missing color/detail information of each level in the Laplacian pyramid.Given an 8-bit sRGB input image, I, rendered with the incorrect exposure setting, our method aims to produce an output image, Y, with fewer exposure errors than those in I. As we simultaneously target both over- and underexposed errors, our input image, I, is expected to contain regions of nearly over- or under-saturated values with corrupted color and detail information. We propose to correct color and detail errors of I in a sequential manner. Specifically, we process a multi-resolution representation of I, rather than directly dealing with the original form of I. We use the Laplacian pyramid [4] as our multiresolution decomposition, which is derived from the Gaussian pyramid [5] of I.

Inspired by this observation and the success of coarseto-fine architectures for various other computer vision tasks (e.g., [14, 33, 41, 54]), we design a DNN that corrects the global color and detail information of I in a sequential manner using the Laplacian pyramid decomposition. The remaining parts of this section explain the technical details of our model (Sec. 4.2), including details of the losses (Sec. 4.3), inference phase (Sec. 4.4), and training (Sec. 4.5).

问题描述和方法形成的动机:摘要和前言中提到的 ‘ 色彩增强和细节增强’ 以及 ‘由粗到细’ 的模型,是通过拉普拉斯金字塔实现的。

设 X 表示图像 I 的拉普拉斯金字塔,有 n 层,X(l) 是 X 的第 l 层。金字塔的最后一层 (即 X(n)) 捕获了 I 的低频信息,而第一级 (即 X(1)) 捕获了高频信息。这些频率级可分为: (i)存储在低频级的全局颜色信息,(ii)存储在中高频级的图像粗细节信息。这些层次可以稍后用来重建全彩图像 I。

图 3 展示从粗到细的曝光校正方法的动机。

Figure 3: Motivation behind our coarse-to-fine exposure correction approach. Example of an overexposed image and its corresponding properly exposed image shown in (A) and (B), respectively. The Laplacian pyramid decomposition allows us to enhance the color and detail information sequentially, as shown in (C) and (D), respectively.

图 3-(A) 和 (B) 分别为过曝光图像和对应的良好曝光目标。可以看出,在拉普拉斯金字塔重建过程中,仅使用目标图像的低频层 (即全局颜色信息) 就可以得到显著的曝光校正,如图 3-(C) 所示。

然后,可以通过纠正拉普拉斯金字塔的每一层,以顺序的方式增强细节,从而改进最终的图像,如图 3-(D) 所示。

实际上,在推理阶段,无法获得图 3-(B) 中适当曝光的图像,因此本文的目标是预测拉普拉斯金字塔中每个层次缺失的颜色/细节信息。

给定一个在不正确的曝光设置下渲染的 8 位 sRGB 输入图像 I,本文的方法旨在产生一个比 I 的曝光误差更少的输出图像 Y。因为同时针对了曝光过度和曝光不足的错误,输入图像 I 中接近过度或欠饱和值的区域,可能包含颜色和细节信息损坏的。

本文提出按顺序纠正 I 的颜色和细节错误。具体来说,本文处理 I 的多分辨率表示,而不是直接处理 I 的原始形式。本文使用拉普拉斯金字塔作为我们的多分辨率分解(通过高斯金字塔实现)。

受到这一观察结果以及其他各种计算机视觉任务中从粗到细架构成功的启发 (例如 [14,33,41,54]),本文设计了一个 DNN,该 DNN 使用拉普拉斯金字塔分解以顺序的方式校正 I 的全局颜色和细节信息。

[14] Deep generative image models using a Laplacian pyramid of adversarial networks. In NeurIPS, 2015. 

[33] Deep Laplacian pyramid networks for fast and accurate super-resolution. In CVPR, 2017.

[41] Efficient and fast real-world noisy image denoising by combining pyramid neural network and two-pathway unscented Kalman filter. IEEE Transactions on Image Processing, 29(1):3927–3940, 2020.

[54] SinGAN: Learning a generative model from a single natural image. In ICCV, 2019.

4.2. Coarse-to-Fine Network

Our image exposure correction architecture sequentially processes the n-level Laplacian pyramid, X, of the input image, I, to produce the final corrected image, Y. The proposed model consists of n sub-networks. Each of these sub-networks is a U-Net-like architecture [52] with untied weights. We allocate the network capacity in the form of weights based on how significantly each sub-problem (i.e., global color correction and detail enhancement) contributes to our final result.

Fig. 4 provides an overview of our network. As shown, the largest (in terms of weights) subnetwork in our architecture is dedicated to processing the global color information in I (i.e., X(n) ). This sub-network (shown in yellow in Fig. 4) processes the low-frequency level X(n) and produces an upscaled image Y(n) . The upscaling process scales up the output of our sub-network by a factor of two using strided transposed convolution with trainable weights.

Next, we add the first mid-frequency level X(n−1) to Y(n) to be processed by the second subnetwork in our model. This sub-network enhances the corresponding details of the current level and produces a residual layer that is then added to Y(n) +X(n−1) to reconstruct image Y(n−1), which is equivalent to the corresponding Gaussian pyramid level n − 1. This refinement-upsampling process proceeds until the final output image, Y, is produced. Our network is fully differentiable and thus can be trained in an end-to-end manner. Additional details of our network are provided in the supplementary materials.

本文的图像曝光校正体系结构依次处理输入图像 I 的 n 层拉普拉斯金字塔 X,以生成最终的校正图像 Y。该模型由 n 个子网络组成。这些子网络中的每一个都是具有松散权值的 U-Net 类架构。根据每个子问题 (即全局颜色校正和细节增强) 对最终结果的贡献程度,以权重的形式分配网络容量。

Figure 4: Overview of our image exposure correction architecture. We propose a coarse-to-fine deep network to progressively correct exposure errors in 8-bit sRGB images. Our network first corrects the global color captured at the final level of the Laplacian pyramid and then the subsequent frequency layers. 

图 4 提供了网络的概览。如图所示,在该体系结构中,最大的 (按权重计算) 子网络用于处理 I (即 X(n)) 中的全局颜色信息。这个子网络 (如图 4 中黄色部分所示) 处理低频电平 X(n),并生成一个放大的图像 Y(n)。升级过程使用带可训练权值的跨步转置卷积,将子网络的输出扩大到原来的 2 倍。

接下来,将第一个中频电平 X(n−1) 添加到 Y(n),由该模型中的第二个子网处理。该子网络增强了当前层次的相应细节,并产生了一个剩余层,然后将其添加到 Y(n) +X(n−1) 来重建图像 Y(n−1),相当于对应的高斯金字塔层 n−1。这个细化采样过程一直进行到最终的输出图像 Y 产生为止。

------------------------------------------------------------

后面三节内容较简单,请参阅原文。

4.3. Losses

4.4. Inference Stage

4.5. Training Details

最后,贴几个实验结果。不得不说,本文的实验做的也是相当充分!

Figure 7: Qualitative results of correcting images with exposure errors. Shown are the input images from our test set, results from the DPED [26], results from the Deep UPE [11], our results, and the corresponding ground truth images. 

Table 1: Quantitative evaluation on our introduced test set. The best results are highlighted with green and bold. The second- and third-best results are highlighted in yellow and red, respectively. We compare each method with properly exposed reference image sets rendered by five expert photographers [6]. For each method, we present peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM) [67], and perceptual index (PI) [3]. We denote methods designed for underexposure correction in gray. Non-deep learning methods are marked by ∗. The terms U and S stand for unsupervised and supervised, respectively. Notice that higher PSNR and SSIM values are better, while lower PI values indicate better perceptual quality.

;