Three Techniques for Rendering Generalized Depth of Field Effects

Depth of field refers to the swath that is imaged in sufficient focus through an optics system, such as a camera lens. Control over depth of field is an important artistic tool that can be used to emphasize the subject of a photograph. In a real camera, the control over depth of field is limited by the laws of physics and by physical constraints. Depth of field has been rendered in computer graphics, but usually with the same limited control as found in real camera lenses. In this paper, we generalize depth of field in computer graphics by allowing the user to specify the distribution of blur throughout a scene in a more flexible manner. Generalized depth of field provides a novel tool to emphasize an area of interest within a 3D scene, to select objects from a crowd, and to render a busy, complex picture more understandable by focusing only on relevant details that may be scattered throughout the scene. We present three approaches for rendering generalized depth of field based on nonlinear distributed ray tracing, compositing, and simulated heat diffusion. Each of these methods has a different set of strengths and weaknesses, so it is useful to have all three available. The ray tracing approach allows the amount of blur to vary with depth in an arbitrary way. The compositing method creates a synthetic image with focus and aperture settings that vary per-pixel. The diffusion approach provides full generality by allowing each point in 3D space to have an arbitrary amount of blur. 1 Background and Previous Work 1.1 Simulated Depth of Field A great deal of work has been done in rendering realistic (non-generalized) depth of field effects, e.g. [4, 6, 12, 15, 13, 17]. Distributed ray tracing [4] can be considered a gold standard; at great computational cost, highly accurate simulations of geometric optics can be obtained. For each pixel, a number of rays are chosen to sample the aperture. Accumulation buffer methods [6] provide essentially the same results of distributed ray tracing, but render entire images per aperture sample, in order to utilize graphics hardware Both distributed ray tracing and accumulation buffer methods are quite expensive, so a variety of faster post-process methods have been created [12, 15, 2]. Post-process methods use image filters to blur images originally rendered with everything in perfect focus. ∗e-mail: koslofto@cs.berkeley.edu †e-mail: barsky@cs.berkeley.edu Post-process methods are fast, sometimes to the point of real-time [13, 17, 9], but generally do not share the same image quality as distributed ray tracing. A full literature review of depth of field methods is beyond the scope of this paper, but the interested reader should consult the following surveys: [1, 2, 5]. Kosara [8] introduced the notion of semantic depth of field, a somewhat similar notion to generalized depth of field. Semantic depth of field is non-photorealistic depth of field used for visualization purposes. Semantic depth of field operates at a per-object granularity, allowing each object to have a different amount of blur. Generalized depth of field, on the other hand, goes further, allowing each point in space to have a different blur value. Generalized depth of field is more useful than semantic depth of field in that generalized depth of field allows per-pixel control over blur, whereas semantic depth of field only allows per-object control. Heat diffusion has previously been shown to be useful in depth of field simulation, by Bertalmio [3] and Kass [7]. However, they simulated traditional depth of field, not generalized depth of field. We introduced generalized depth of field via simulated heat diffusion in [10]. For completeness, the present work contains one section dedicated to the heat diffusion method. Please see [10] for complete details. 1.2 Terminology The purpose of this section is to explain certain terms that are important in discussions of simulated depth of field. Perhaps the most fundamental concept is that of the point spread function, or PSF. The PSF is the blurred image of a single point of light. The PSF completely characterizes the appearance of blur. In the terminology of linear systems, the PSF is the impulse response of the lens. Photographers use the Japanese word bokeh to describe the appearance of the out-of-focus parts of a photograph. Different PSFs will lead to different bokeh. Typical high-quality lenses have a PSF shaped like their diaphragm, i.e. circles or polygons. On the other hand, computer generated images often use Gaussian 42 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. PSFs, due to mathematical convenience coupled with acceptable image quality. Partial occlusion is the appearance of transparent borders on blurred foreground objects. During the image formation process, some rays are occluded, and others are not, yielding semi-transparency. At the edges of objects, we encounter depth discontinuities. Partial occlusion is observed at depth discontinuities. Post-processing approaches to depth of field must take special care at depth discontinuities, because failure to properly simulate partial occlusion leads to infocus silhouettes on blurred objects. We refer to these incorrect silhouettes as depth discontinuity artifacts. Signal processing theory can be used to describe image filters as the convolution of the PSF with the image. Convolution can be equivalently understood in the spatial domain either as gathering or spreading. Gathering computes each output pixel as a linear combination of input pixels, weighted by the PSF. Spreading, on the other hand, expands each input pixel into a PSF, which is then accumulated in the output image. It is important to observe that convolution by definition uses the same PSF at all pixels. Depth of field, however, requires a spatially varying, depth-dependent PSF. Either gathering or spreading can be used to implement spatially varying filters, but they generate different results. 1.3 Application to Industry Generalized depth of field has application to the photography and film industries. Photographers and cinematographers often employ depth of field to selectively focus on the subject of a scene while intentionally blurring distracting details. Generalized depth of field provides an increase in flexibility; enabling the photographer or cinematographer to focus on multiple subjects, or on oddly shaped subjects. Such focus effects are not possible with traditional techniques. Our compositing method applies both to live action films and to the increasingly popular computer generated film industry. Our nonlinear distributed ray tracing approach is intended for computer generated images only. The simulated heat diffusion method is intended primarily for computer generated images, but is also applicable to real photography when depth values can be recovered. We suspect that the flexibility of generalized depth of field will enhance the already substantial creative freedom available to creators of computer generated films. Generalized depth of field will enable novel and surreal special focal effects that are not available in existing rendering pipelines. 2 Method 1: Nonlinear Distributed Raytracing 2.1 Overview Distributed ray tracing is a wellknown method for accurately rendering depth of field Figure 1: Top: Distributed ray tracing simulating conventional depth of field. Bottom: Nonlinear distributed ray tracing simulating generalized depth of field. [4]. Each pixel Pi in Figure 1 traces a number of rays, to sample the aperture. Usually a realistic lens model is used, resulting in the set of rays for each pixel forming a full cone, with the singularity at the plane of sharp focus. We generalize distributed ray tracing by allowing the rays to bend in controlled ways while traveling through free space. This bending causes the cone to expand wherever we desire the scene to be blurred, and contract to a singularity wherever we want it to be in focus. A natural use for this method is to allow for several focus planes, at various depths, with regions of blur in between (Figure 2). The cone can furthermore vary from pixel to pixel, allowing the amount of blur to vary laterally as well as with depth. Figure 1 illustrates the difference between ordinary distributed ray tracing and nonlinear generalized ray racing. Distributed ray tracing sends rays from a point on the image plane, through a lens, and into the scene. Top: The rays converge to a single focus point F , simulating conventional depth of field. Bottom: The rays change direction to converge to a second focus point, rendering generalized depth of field. The user gets to control the location of the focus points (F1 and F2), as well as the amount of blur between focus points (B1 and B2) 2.2 Implementation The key design issue in nonlinear distributed ray tracing lies is the choice of ray representation. To achieve the desired quality, we need an interpolating spline of some type. This is to ensure that the in-focus points are completely in focus, while the blurred points are blurred by precisely the right amount. Additionally, we need a curve representation that can be efficiently intersected with scene geometry. For quality reasons, it would seem that a smooth curve such as a piecewise cubic spline is appropriate. However, we found that a simple piecewise linear polyline is satisfactory. In our experience we simply need one line segment in between each pair of focal points This is fortunate, as intersecting line segments with scene geometry is the standard intersection test in the ray tracing literature. Our method requires several line-scene intersections per ray, to account for the multiple segments. 43 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. This increases the cost, but by a reasonable amount, especially compared to the cost of intersecting with higher order curves. 2.3 Advantages Nonlinear distributed raytracing