Local Linear Approximation for Camera Image Processing Pipelines

Modern digital cameras include an image processing pipeline that converts raw sensor data to a rendered RGB image. Several key steps in the pipeline operate on spatially localized data (demosaicking, noise reduction, color conversion). We show how to derive a collection of local, adaptive linear filters (kernels) that can be applied to each pixel and its neighborhood; the adaptive linear calculation approximates the performance of the modules in the conventional image processing pipeline. We also derive a set of kernels from images rendered by expert photographers. In both cases, we evaluate the accuracy of the approximation by calculating the difference between the images rendered by the camera pipeline with the images rendered by the local, linear approximation. The local, linear and learned (L3) kernels approximate the camera and expert processing pipelines with a mean S-CIELAB error of ∆E < 2. A value of the local and linear architecture is that the parallel application of a large number of linear kernels works well on modern hardware configurations and can be implemented efficiently with respect to power. Introduction The image processing pipeline in a modern camera is composed of serially aligned modules, including dead pixel removal, demosaicing, sensor color conversion, denoising, illuminant correction and other components (e.g., sharpening or hue enhancement). To optimize the rendered image, researchers designed and optimized the algorithms for each module and added new modules to handle different corner cases. The majority of commercial camera image processing pipelines consist of a collection of these specialized modules that are optimized for one color filter array design Bayer pattern (one red, one blue and two green pixels in one repeating pattern). New capabilities in optics and CMOS sensors have make it possible to design novel sensor architectures that promise to offer features that extend the original Bayer RGB sensor design. For example, recent years have produced a new generation of architectures to increase spatial resolution [1], control depth of field through light field camera designs (Lytro, Pelican Imaging, Light.co), extend dynamic range and sensitivity by the use of novel arrangements of color filters [2-5] and mixed pixel architectures [6]. There is a need to define an efficient process for building image rendering pipelines that can be applied to each of the new designs. In 2011, Lansel et al. [7] proposed an image processing pipeline that efficiently combines several key modules into one computational step, and whose parameters can be optimized using automated learning methods [8-10]. This pipeline maps raw sensor values into display values using a set of local, linear and learned filters, and thus we refer to it as the L3 method. The kernels for the L3 pipeline can be optimized using simple statistical methods. The L3 algorithm automates the design of key modules in the imaging pipeline for a given sensor and optics. The learning method can be applied to both Bayer and non-Bayer color filter arrays and to systems that use a variety of optics. We illustrated the method using both simulations [10] and real experimental data from a five-band camera prototype [9]. Computationally, the L3 algorithm relies mainly on a large set of inner products, which can be efficient and low power [11]. The L3 algorithm is part of a broader literature that explores how to incorporate new optimization methods into the image processing pipeline. For example, Stork and Robinson [12] developed a method for jointly designing the optics, sensor and image processing pipeline for an imaging system. Their optimization focused on the design parameters of the lens and sensor. Khabashi et al. [13] propose using simulation methods and Regression Tree Fields to design critical portions of the image processing pipeline. Heide et al. [14] have proposed that the image processing pipeline should be conceived of as a single, integrated computation that can be solved using modern optimization methods as an inverse problem. Instead of applying different heuristics for the separate stages of the traditional pipeline (demosaicing, denoising, color conversion), they rely on image priors and regularizers. Heide and colleagues [14, 15] use modern optimization methods and convolutional sparse coding to develop image pipelines as well as to address the more general image processing techniques, such as inpainting. The distinctive emphasis of the L3 method is how it couples statistical learning methods with a simple computational architecture to create new pipelines that are efficient for use on modern mobile devices. Here we identify two new applications of the L3 pipeline. First, we show that the L3 pipeline can learn to approximate other highly optimized image processing pipelines. We demonstrate this by comparing the L3 pipeline with the rendering from a very high quality digital camera. Second, we show that the method can learn a pipeline that is created as the personal preferences of individual users. We demonstrate this by arranging for the L3 pipeline to learn the transformations applied by a highly skilled photographer. Proposed Method: Local Linear and Learned In our previous work, we used image systems simulation to design a pipeline for novel camera architectures [9, 10]. We cre©2016 Society for Imaging Science and Technology DOI: 10.2352/ISSN.2470-1173.2016.18.DPMI-248 IS&T International Symposium on Electronic Imaging 2016 Digital Photography and Mobile Imaging XII DPMI-248.1 ated synthetic scenes and camera simulations to create sensor responses and the ideal rendered images. We used these matched pairs to define sensor response classes where the transformation from the sensor response to the desired rendered image could be well-approximated by an affine transformation. The L3 parameters define the classes, Ci, and the transformations from the sensor data to the rendered output for each class, Ti. We use the same L3 principles to design an algorithm that learns the linear filters for each class from an existing pipeline. This application does not require camera simulations; instead, we can directly learn the L3 parameters using the sensor output and corresponding rendered images. The rendered images can be those produced by the camera vendor, or they can be images generated by the user. The proposed method consists of two independent modules: 1) learning local linear kernels from raw image and corresponding rendered RGB image 2) rendering new raw images into desired RGB output. The learning phase is conducted once for one camera model, and the kernels are stored for future rendering. The rendering process is efficient as it involves loading the class definitions and kernels and applying them to generate the output images. Kernel Learning In general, our task is to find for each class a P× 3 linear transformation (kernel), Ti such that argminTi ∑ j∈Ci L(y j,X jTi) Here, X j, y j are the jth example data set from the RAW sensor data and the rendered RGB image values for class i. The function L specifies the loss function (visual error). In commercial imaging applications, the visual difference measure in CIE ∆Eab can be a good choice for the loss function. In image processing applications, the transformation from sensor to rendered data is globally non-linear. But, as we show here the global transformation can be well approximated as an affine transform for appropriately defined classes Ci. When the classes Ci are determined, the transforms can be solved for each class independently. The problem can be expressed in the form of ordinary least-squares. To avoid noise magnification in low light situations, we use ridge regression and regularize the kernel coefficients. That is Ti = argmin||ỹ−XTi||2 +λ ||Ti||2 Here, λ is the regularization parameter, and y is the output in the target color space as a N×3 matrix. The sensor data in each local patch is re-organized as rows in X . There are P columns, corresponding to the number of pixels in the sensor patch. The closed-form solution for this problem is given as Ti = (XT X +λ I)−1XT ỹ The computation of Ti can be further optimized by using singular vector decomposition (SVD) of X . That is, if we decompose X =UDV T , we have Ti =V ×diag ( D j Dj +λ ) UT ỹ The regularization parameter (λ ) is chosen to minimize the generalized cross-validation (GCV) error [16]. We performed these calculations using several different target color spaces, including both the CIELAB and sRGB representations. Patch Classification To solve the transforms Ti, the Ci classes must be defined. The essential requirement for choosing classes is that the sensor data in the class can be accurately transformed to the response space. This can always be achieved by increasing the number of classes (i.e., shrinking the size of the class). In our experience, it is possible to achieve good local linearity by defining classes according to their mean response level, contrast, and saturation. Mean channel response estimates the illuminance at the sensor and codes the noise. Contrast measures the local spatial variation, reflecting flat/texture property of the scene. Finally, saturation type checks for the case in which some of the channels no longer provide useful information. It is particularly important to separate classes with channel saturation.

[1]  Shree K. Nayar,et al.  Generalized Assorted Pixel Camera: Postcapture Control of Resolution, Dynamic Range, and Spectrum , 2010, IEEE Transactions on Image Processing.

[2]  G. Golub,et al.  Good Ridge Parameter , 1979 .

[3]  Brian A. Wandell,et al.  Automating the design of image processing pipelines for novel color filter arrays: local, linear, learned (L3) method , 2014, Electronic Imaging.

[4]  Brian A. Wandell,et al.  Interleaved imaging: an imaging system design inspired by rod-cone vision , 2009, Electronic Imaging.

[5]  Kari Pulli,et al.  FlexISP , 2014, ACM Trans. Graph..

[6]  ランセル、スティーブン・ピー,et al.  Learning image processing pipelines for digital imaging devices , 2012 .

[7]  Brian A. Wandell,et al.  Local Linear Learned Image Processing Pipeline , 2011 .

[8]  Gang Luo A novel color filter array with 75% transparent elements , 2007, Electronic Imaging.

[9]  David G Stork,et al.  Theoretical foundations for joint digital-optical analysis of electro-optical imaging systems. , 2008, Applied optics.

[10]  Brian A. Wandell,et al.  A spatial extension of CIELAB for digital color‐image reproduction , 1997 .

[11]  Brian A. Wandell,et al.  Automatically designing an image processing pipeline for a five-band camera prototype using the local, linear, learned (L3) method , 2015, Electronic Imaging.

[12]  Andrew W. Fitzgibbon,et al.  Joint Demosaicing and Denoising via Learned Nonparametric Random Fields , 2014, IEEE Transactions on Image Processing.

[13]  Qiyuan Tian,et al.  Accelerating a learning-based image processing pipeline for digital cameras , 2015 .

[14]  Javier Hernández-Andrés,et al.  Combining transverse field detectors and color filter arrays to improve multispectral imaging systems. , 2014, Applied optics.

[15]  Gordon Wetzstein,et al.  Fast and flexible convolutional sparse coding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).