Focal-Plane Scale Space Generation with a 6T Pixel Architecture

Aiming at designing a CMOS image sensor that combines high fill factor and focal-plane implementation of instrumental image processing steps, we propose a simple modification in a standard pixel architecture in order to allow for charge redistribution among neighboring pixels. As a result, averaging operations may be performed at the focal plane, and image smoothing based on Gaussian filtering may thus be implemented. By averaging neighboring pixel values, it is also possible to generate intermediate data structures that are required for the computation of Haar-like features. To show that the proposed hardware is suitable for computer vision applications, we present a systemlevel comparison in which the scale-invariant feature transform (SIFT) algorithm is executed twice: first, on data obtained with a classical Gaussian filtering approach, and then on data generated from the proposed approach. Preliminary schematic and extracted layout pixel simulations are also presented. Introduction Per-pixel pre-processing on spatial image sensor samples discards redundant data, thus decreasing the demands posed on the readout circuitry and enhancing SWaP (size, weight, and power) factors of camera systems based on these smart sensors. Image sensor architectures with embedded per-pixel processing have been proposed for a large variety of tasks [1], and their industrial exploitation is ramping up [2]. However, all these architectures share common drawbacks, namely increased pixel pitches, reduced fill factors, and lack of compatibility with advanced CIS technologies. All-in-all these drawbacks result in image quality degradation. Based on our previous results on image sensors with perpixel calculation of Gaussian filters, this paper presents a sixtransistor image pixel that is aimed at overcoming previous drawbacks, while performing Gaussian filtering and allowing the subsequent calculation of image key-points from the filter outputs [3]. As compared to previous sensors with embedded per-pixel Gaussian filtering [4][5], this 6T pixel, which occupies 6.28 × 6.28μm2 in a 110 nm CIS technology, yields 78% pitch reduction and 253% fill factor enlargement. With a two-transistor-per-pixel overhead with respect to conventional 4T pixels, the proposed image sensor architecture implements the following functions: • image capture, which is performed in the same way as in the 4T conventional architecture; • neighbor pixel value averaging, realized by charge redistribution; • image smoothing, by combining pixel values inside 2 × 2 pixel blocks and reconfiguring the array so that a Gaussian filter approximation is performed on an image having one quarter the resolution of the original image [6]; • scale-space [7] generation, by repeatedly applying the Gaussian filter. The proposed architecture performs simple parallel operations for the entire pixel matrix while images are captured. The processing capability of the proposed hardware can be used to optimize, in terms of processing time and power consumption, early vision tasks of image processing algorithms. The scale invariant feature transform (SIFT) [7], used for object recognition, and the Viola-Jones [8] object detector are examples of two algorithms that benefit from the presented pixel; in the first case with the scale-space generation and in the second by helping Haar-like feature computation. This paper addresses the SIFT algorithm and contextualizes the hardware-based solution in the algorithm processing flow. The pixel architecture is explained in the next section. Then, we show how it is possible to generate the scale-space for the SIFT with the proposed hardware. By the end of the paper, system level simulations are shown, as well as schematic and layout Spectre simulations. Proposed Pixel Architecture In the classical 4T architecture a pinned photodiode is connected to a floating diffusion through a transfer gate transistor. When the transfer gate is activated, all the charge stored by the photodiode, which is proportional to the incident light, is sent to the floating diffusion node where it is read and sent to the output by a source follower. We propose a minor change in this architecture, as shown in Figure 1 (top). Two transistors, acting as switches, are added to the 4T architecture in order to connect the floating diffusion nodes of neighboring pixels and to create a reconfigurable array in which every pixel is connected to four of its neighbors. The sensor matrix interconnection pattern can be seen in Figure 1 (bottom). As the switches are closed, pixel averages are computed within every neighborhood in the array. As a first application of the proposed hardware, when all switches are closed an average operation is computed from the entire image, allowing an instant measure of the global luminance of the matrix, which can be used to adjust the dynamic range of the image with algorithms such as the tone mapping. The proposed scheme permits to compute the average of pixels grouped in squares or rectangles with the size defined by the user. This simple operation allows the acceleration of Haar-like features computation, in which the mean value of neighboring rectangles within the image are compared with the goal of detecting a region where there is a high probability of finding a desired

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Ákos Zarándy,et al.  Focal-Plane Sensor-Processor Chips , 2014 .

[3]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[4]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[5]  Ricardo Carmona-Galán,et al.  Automatic DR and Spatial Sampling Rate Adaptation for Secure and Privacy-Aware ROI Tracking Based on Focal-Plane Image Processing , 2015 .

[6]  Ángel Rodríguez-Vázquez,et al.  A CMOS Vision System On-Chip with Multi-Core, Cellular Sensory-Processing Front-End , 2010 .

[7]  Manuel Suárez Cambre Low power CMOS vision sensors for scale and rotation invariant feature detectors using cmos heterogeneous smart pixel architectures , 2015 .

[8]  Ángel Rodríguez-Vázquez,et al.  High-level Performance Evaluation of Object Detection based on Massively Parallel Focal-plane Acceleration Requiring Minimum Pixel Area Overhead , 2016, VISIGRAPP.

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  Ricardo Carmona-Galán,et al.  FLIP-Q: A QCIF Resolution Focal-Plane Array for Low-Power Image Processing , 2011, IEEE Journal of Solid-State Circuits.

[11]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[12]  Ángel Rodríguez-Vázquez,et al.  Image filtering by reduced kernels exploiting kernel structure and focal-plane averaging , 2011, 2011 20th European Conference on Circuit Theory and Design (ECCTD).

[13]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[14]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[15]  Donald B. Hondongwa,et al.  A Review of the Pinned Photodiode for CCD and CMOS Image Sensors , 2014, IEEE Journal of the Electron Devices Society.

[16]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[17]  Olivier Marcelot,et al.  Pixel Level Characterization of Pinned Photodiode and Transfer Gate Physical Parameters in CMOS Image Sensors , 2014, IEEE Journal of the Electron Devices Society.

[18]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.