论文信息 - High quality video in high dynamic range scenes from interlaced dual-ISO footage

High quality video in high dynamic range scenes from interlaced dual-ISO footage

In this paper we present a simple and affordable method to generate high quality video from a high dynamic range scene. It is performed without utilizing extra lighting, neither alternating exposures, nor operating with dual-camera set-ups. Our input is an interlaced video alternating row pairs with different ISO values, as some DSLR camera models can provide. The proposed algorithm involves two main steps: first the computation of two single-ISO full-frame images (one for each ISO value) using an inpainting-based deinterlacing method, followed by their combination into a single frame by a weighted average. This results in a high dynamic range frame, containing all the details in bright and dark areas at the same time, that is finally tone-mapped into a low dynamic range frame for display purposes. Current results show this is a practical and cost-effective method that produces outputs free of ghosting artifacts and with very little noise. Introduction The human visual system is able to adjust to world scenes where the light intensity values cover a very wide range, capturing details in dark and bright areas simultaneously. Dynamic range is defined as the ratio between the brightest and the darkest intensity levels, and while in common situations the light coming from a scene is of high dynamic range (HDR), the vast majority of camera sensors (and displays) are of low dynamic range (LDR). There is a vast literature on methods for creating HDR images using regular, LDR sensors, that started with the seminal approaches of [17, 7]; in those works, several LDR pictures of the same HDR scene are taken with varying exposure time, so that the short exposures capture details in the bright regions, long exposures capture details in the dark regions, and finally all these images are combined into a single, HDR image with overall detail visibility. In order to be able to show this image on a regular, LDR display, it has to be transformed through a process called tone-mapping that compresses the dynamic range of the image while trying to maintain its details and natural appearance. In the movie industry there is a growing interest in HDR imaging, but the challenge of shooting HDR scenes using LDR equipment exists since the inception of cinema: the way to address it is to add artificial lights, whose effect is to raise the intensity levels of the darkest parts of the image hence reducing the dynamic range of the scene, fitting it into the reduced range of the capture medium (film or digital). This is a cumbersome, expensive procedure requiring very significant human and material resources that greatly affect the cost of the production. Some alternatives exist but they are not fully practical: some digital cinema camera models are able to alternate exposure times on consecutive frames, creating pairs of different-exposure images that are then fused following the approach of [7], but camera and/or object motion produces ghosting artifacts on the fusion results. A recent possibility is to use a dual-camera set-up [9], with two synchronized, perfectly registered cameras on an orthogonal rig so that a semi-transparent mirror sends most of the light intensity to one of the cameras, and the rest to the other camera. These images can be fused without problem because they are fully aligned, so there is no risk of ghosting, but the dual-camera process has limitations: cost and practicality considerations stemming from the use of two cameras, image problems caused by imperfections in the mirror, the need to perform tone-mapping to the output. For more details we refer the reader to [3]. More recently, we find a very small number of works that perform HDR reconstruction from a single interlaced image. Gu et al. [11] combine rows taken with different exposures times, and since the rows are not captured simultaneously this method produces ghosting artifacts as well, which need to be reduced by estimating and compensating for the motion-blur. The camera software Magic Lantern (ML) [16] allows some camera models to capture image/video with dual-ISO values that alternate between consecutive image line pairs. It provides an implementation to interpolate a full-frame low-ISO image, containing less noise on shadow areas. It does not claim to compute an HDR image, though the final picture is the result of combining the information from both the low-ISO and the high-ISO full-frame images. The method follows a chain of steps: separate the two ISO frames, interpolate the missing lines to get the full images, and combine information from both interpolated frames to highly reduce the noise in dark regions. Hajisharif et al. [12] perform at the same time demosaicing, denoising, re-sampling and HDRreconstruction, starting from the interlaced input provided by the ML software [16]. This method requires a previous radiometric calibration process, therefore it cannot be used when the camera is not available. A similar idea was developed by Heide et al. [13], who propose a single optimization step using image priors and regularizers of the different stages going on in the camera color pipeline (denoising, demosaicing, etc). The selection of image priors is crucial for the optimization process, and the values are highly dependent on the set of images selected for learning the best weights. These latter two methods have the advantage of working directly with the RAW data without following a staged pipeline, therefore no cumulative errors are carried out from one process to the next; nevertheless, this integration makes it difficult to further extend the processes involved, since they are not independent, plus it’s also challenging to locate and rectify errors in the pipeline. Our main contribution in this work is to propose a simple and effective method to shoot high quality video in HDR scenarFigure 1. Schematic of the proposed method. First the dual-ISO input is split into two half-size images Il/2 and Ih/2, each one with the rows corresponding to a single ISO value. Next a deinterlacing method is used to generate full-frame images Il and Ih from Il/2 and Ih/2 respectively. These full-frame images are linearly combined and tone-mapped to produce the final output. ios using a single camera capable of recording interlaced dual-ISO footage. The interlaced input guarantees that the result will be free of ghosting artifacts, because the low and the high ISO lines are recorded simultaneously. No access to the camera used to record the input is required. Our implementation pipeline incorporates a set of stages: calculating the full-frame single-ISO images applying a deinterlacing algorithm, combining these full-frame images into a single HDR picture, and finally applying a tone-mapping operator to produce a LDR output. For these stages we adapt to our setting state-of-the-art algorithms which produce high-quality results. Tests and comparisons show that our method outperforms other approaches both quantitatively (in terms of PSNR) and qualitatively (no spurious colors, better edge preservation, less noise). Methodology The input to our algorithm will be a video sequence in RAW format, where each frame alternates row pairs with different ISO values, and the output will be a LDR video sequence with simultaneous detail visibility in the dark and bright zones of the picture, see scheme in Figure 1. All the stages of our method are applied on a frame-by-frame basis except for the final tone-mapping, which imposes temporal consistency on the output by considering several input frames at the same time. Let’s describe our proposed method in detail. Generation of single-ISO full-frame images The first stage consists in the computation of two singleISO full-frame images from an input dual-ISO frame, using an inpainting-based deinterlacing method. In order to obtain both these full-frame pictures Il (for the low ISO value) and Ih (for the high ISO value) we proceed as follows: • Split the dual-ISO input into two half-size images Il/2 and Ih/2, each one with the rows corresponding to a single ISO value, see Figure 2. Figure 2. The dual-ISO input frame (left), which is split into two half-size images Il/2 and Ih/2 (right), each one with the rows corresponding to a single ISO value. • Using an adapted version of the inpainting-based deinterlacing method [2], generate full-frame images Il and Ih from Il/2 and Ih/2 respectively. • Perform demosaicing using [20], and apply a refinement step to improve the interpolated results, see Figure 3. Figure 3. Generated full-frame images Il (left) and Ih (right). Row interpolation by deinterlacing In order to interpolate the missing rows and generate Il from Il/2 and Ih from Ih/2, we adapt the deinterlacing method of Ballester et al. [2]. This is a state-of-the-art technique that follows the dense stereo matching approach of Cox et al. [5] and fills-in a missing line L0 between two given lines L− and L+ by (see Fig. 4): • First, performing a global matching between lines L− and L+ by computing the correlation matrix between their image values and finding the matches (optimal path) through dynamic programming. In practice only a band around the diagonal is considered for the search for the optimal path, and the matches are estimated assuming a certain noise variance for the image values. • Second, each matching pair of pixels determines a segment that crosses the missing line L0 at a pixel location that is filled-in with the average of the matching pair. For those points that were not matched, their values are computed by bilinear interpolation from the neighboring correspondences. The images Il/2 and Ih are color filter array (CFA) RAW pictures with the 2× 2 Bayer pattern ‘RGGB’; with the R and B values we create half-size channels that can be deinterlaced directly with [2], while for the G channel we first demosaic the values using [20] and then apply the deinterlacing method [2] but with the modification of filling-in two consecutive

Marcelo Bertalmío | Raquel Gil Rodríguez

[1] Wolfgang Heidrich,et al. HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions , 2011, SIGGRAPH 2011.

[2] Tomoo Mitsunaga,et al. Coded rolling shutter photography: Flexible space-time sampling , 2010, 2010 IEEE International Conference on Computational Photography (ICCP).

[3] Lei Zhang,et al. Color demosaicking via directional linear minimum mean square-error estimation , 2005, IEEE Transactions on Image Processing.

[4] Yun-Ta Tsai,et al. Supplemental Material for FlexISP : A Flexible Camera Image Processing Framework , 2014 .

[5] Masatoshi Okutomi,et al. Pseudo four-channel image denoising for noisy CFA raw data , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[6] Ingemar J. Cox,et al. A Maximum Likelihood Stereo Algorithm , 1996, Comput. Vis. Image Underst..

[7] Jonas Unger,et al. HDR Reconstruction for Alternating Gain (ISO) Sensor Readout , 2014, Eurographics.

[8] Steve Mann,et al. ON BEING `UNDIGITAL' WITH DIGITAL CAMERAS: EXTENDING DYNAMIC RANGE BY COMBINING DIFFERENTLY EXPOSED PICTURES , 1995 .

[9] Jitendra Malik,et al. Recovering high dynamic range radiance maps from photographs , 1997, SIGGRAPH.

[10] Arcangelo Bruna,et al. Color space transformations for digital photography exploiting information about the illuminant estimation process. , 2012, Journal of the Optical Society of America. A, Optics, image science, and vision.

[11] Javier Vazquez-Corral,et al. The intrinsic error of exposure fusion for HDR imaging, and a way to reduce it , 2015, BMVC.

[12] Richard Szeliski,et al. High dynamic range video , 2003, ACM Trans. Graph..

[13] Mark D. Fairchild,et al. The HDR Photographic Survey , 2007, CIC.