Automatic Compensation for Camera Settings for Images Taken under Different Illuminants

The combination of two images shot for the same scene but under different illumination has been used in wide applications ranging from estimating scene illumination, to enhancing photographs shot in dark environments, to shadow removal. An example is the use of a pair of images shot with and without a flash. However, for consumer-grade digital cameras, due to the different illumination conditions the two images usually have different camera settings when they are taken, such as exposure time and white balance. Thus adjusting (registering) the two images becomes a necessary step prior to combining the two images. Unfortunately, how to register these two images has not been investigated fully. In this paper, we propose a method which can parametrically adjust the two images so as to compensate for the difference in exposure speed, ISO, aperture size, and white balance. This is accomplished by training a 2nd-order masking model on a set of image pairs to predict the model parameters. This trained model can then be used to register two images. In the training phase, we wish to develop a scheme for adjusting the magnitude in each color channel of one image to register with the other image, for each image pair. The problem is difficult because the difference between the two images is a composite of both camera settings and illumination. Here, we use the simple fact that a shadow effect should be caused purely by the changes of illumination. Suppose we have two images, one of which is taken under illuminant 1 and the other is taken under illuminant 1 plus illuminant 2. If we subtract the first image from the second, a shadow caused by illuminant 1 should disappear in the resulting difference. By adjusting the RGB pixel values of one image so as to completely remove the shadow in the difference image, compensating magnitudes for each color channel can be computed and used to train a masking model. This masking model can then accurately compensate for camera settings for any two new images such that the difference between compensated images reflects only the difference in illumination. Introduction Work on image pairs taken for the same scene but under different illumination was initiated in [1]. This work combined flash and no-flash images to estimate surface reflectance and illumination in the no-flash image. The no-flash image was first subtracted from the with-flash image to create a pure flash image, which appears as if it were taken under only light from the flash. Petschnigg et al. also used flash and no-flash image pairs, to enhance photographs shot in dark environments and remove red-eye effects [2]. Drew et al. [4] suggested a method to remove the ambient shadow from the no-flash image through the aid of the flash image. These works all analyze the difference image: ( with-flash image − no-flash image ), to infer the contribution of the flash light to the scene. But to make this computation meaningful, the images must be in the same camera settings. These include: exposure time, ISO, aperture, and white balance. Since different lighting conditions usually cause changes of camera settings, an effective registering method to compensate for the difference, for the image pair, is necessary. Unfortunately, this problem has never been investigated carefully. This is partly because the difference between the two images is a composite of the difference of camera settings and of the light arriving at the camera, and they are difficult to separate. In this paper, we present a method which uses a masking model to compute the compensation between the two images given the camera settings for the two images. This model assumes additivity and proportionality of the different factors involved. Here we use a 2nd-order masking model, with 9 parameters. To train the model for a digital camera, we collect a set of image pairs. For each image pair, two images are taken for the same scene, under different lighting conditions. In the training phase, we restrict our attention to Lambertian surfaces. For such a reflectance map, at the surface point corresponding to each pixel, all lighting is added up into a single effective light [5]. Suppose we have a pair of images A and B. Both are taken for the same scene but under a different lighting situation: only one light source, illuminant 1, illuminates the scene for A; there are two light sources, illuminant 1 and illuminant 2, illuminating the scene for B. An example of this situation would be taking the first image under sunlight, and the second image with flash added to the sunlight, and such pairs form our training set. Fig. 1 shows such a no-flash (ambient or A) image and with-flash (both or B) image: Figs. 1(a,b) are respectively the ambient-light image, unscaled, and the same image scaled for display; and Figs. 1(c,d) are the “both” (with-flash) image unscaled and scaled. This additivity property leads to the fact that for the two images, A and B, subtracting the first from the second leads to the shadow caused by illuminant 1 disappearing in the difference image, i.e. as if it were taken under illuminant 2 only. For the above situation, the difference image will be a pure flash image, and the shadow caused by sunlight will disappear. This will be true for image pairs under any ambient lighting plus another image that includes light from a flash. The use of a shadow helps us to be able to separate the contributions from the camera settings from illumination in the difference of the two images. That is, the shadow should be caused purely by the changes in illumination. We thus can use this fact to compute by what magnitude we have to adjust A so that illuminant 1 causes no shadows to appear in B−A. This magnitude will then be used to train the parameters for a masking model. In the training phase, we first collect image pairs for which we set up an imaging environment which has two light sources. There are shadow regions caused by one of illuminants in each pair of images. For each image pair, we first adjust the magnitude in each color channel of image A, so that the difference image, which is obtained by subtracting the adjusted image A from B, has the shadow removed. Then using the adjusted images, the parameters of the masking model can be computed given the camera settings of the two images. Once we obtain the parameters of the model, we can use this masking model to adjust new image pairs (A,B) as if they are taken under the same camera settings, such that the difference between the two images will be totally controlled only by the light arriving at the camera. The masking model delivers a 3-vector adjustment to generate A from A. Camera Settings and Image Acquisition We have designed an algorithm that works with images acquired even using just consumer-grade digital cameras, with specialized equipment not required. All of the images were acquired in a RAW format and then conversion software was used to convert them into 12-bit linear TIFF images. Consumer-grade cameras typically have the following settings which can be adjusted automatically or by user control: 1. Focal length. 2. Exposure time (shutter speed). 3. Aperture (f-number). 4. ISO (film speed). 5. White balance. We fix the focal length so that the camera’s focus remains constant. We also use a tripod when taking images to ensure that the image pair capture exactly the same points in the scene (else spatial registration is required — mutual information is the standard approach to this problem). For other settings, we turn on the ‘auto’ function and let the camera decide how to set the exposure time, aperture, ISO and white balance for different lighting situations. The size of the aperture and the brightness of the scene control the amount of light that enters the camera during a period of time, and the shutter controls the length of time that the light hits the recording surface. In photography, exposure value (EV) is a value given to all combinations of camera shutter speed and aperture that give the same exposure. In the Additive Photographic Exposure System (APEX) [3], the exposure value is the sum of the Aperture Value (AV) and the Time Value (TV):

[1]  Berthold K. P. Horn,et al.  Determining Shape and Reflectance Using Multiple Images , 1978 .

[2]  Michael F. Cohen,et al.  Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..

[3]  Cheng Lu,et al.  Removing Shadows using Flash/Noflash Image Edges , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[4]  Feng Xiao,et al.  Illuminating Illumination , 2001, CIC.