3-D Stacked Image Sensor With Deep Neural Network Computation

This paper investigates the power and performance trade-offs associated with integrating deep neural network (DNN) computation in an image sensor. The paper presents the design of Neurosensor–a CMOS image sensor with 3-D stacking of pixel array, read-out circuits, memory, and computing logic for DNN. The analysis shows integrating DNN reduces transmit latency (and energy), but at the expense of processing and memory access latency (and energy). Hence, given a specific DNN and transmission bandwidth, there exist an optimal number of layers that should be computed in the sensor to maximize energy-efficiency. In general, it is often more efficient to integrate memory within the sensor stack and/or implement only the feature extraction layers on the sensor, and optimized configurations can achieve up to $90\times $ improvement in energy efficiency compared to the baseline. Further, coupled power, thermal, and noise simulation demonstrates that integrating DNN computation can increase pixel-array temperature resulting in higher noise, and hence, lower classification accuracy.

[1]  Kang Wook Lee,et al.  Chip-based hetero-integration technology for high-performance 3D stacked image sensor , 2012, 2012 2nd IEEE CPMT Symposium Japan.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Christoforos E. Kozyrakis,et al.  Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Saibal Mukhopadhyay,et al.  A Single-Chip Image Sensor Node With Energy Harvesting From a CMOS Pixel Array , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[8]  Mitsumasa Koyanagi,et al.  Die-Level 3-D Integration Technology for Rapid Prototyping of High-Performance Multifunctionality Hetero-Integrated Systems , 2013, IEEE Transactions on Electron Devices.

[9]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[10]  Koga Hiroki,et al.  A 1/2.3in 20Mpixel 3-Layer Stacked CMOS Image Sensor with DRAM , 2017 .

[11]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  Kwanyeob Chae,et al.  Analysis of the Performance, Power, and Noise Characteristics of a CMOS Image Sensor With 3-D Integrated Image Compression Unit , 2014, IEEE Transactions on Components, Packaging and Manufacturing Technology.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Mitsumasa Koyanagi,et al.  A block-parallel ADC with digital noise cancelling for 3-D stacked CMOS image sensor , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).

[15]  D. Joseph,et al.  Transient Response and Fixed Pattern Noise in Logarithmic CMOS Image Sensors , 2007, IEEE Sensors Journal.

[16]  Olivier Temam,et al.  The improbable but highly appropriate marriage of 3D stacking and neuromorphic accelerators , 2014, 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[17]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[18]  Naohiro Takazawa,et al.  3-D-Stacked 16-Mpixel Global Shutter CMOS Image Sensor Using Reliable In-Pixel Four Million Microbump Interconnections With 7.6- $\mu \text{m}$ Pitch , 2016, IEEE Transactions on Electron Devices.

[19]  Luca Benini,et al.  Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes , 2017, IEEE Transactions on Parallel and Distributed Systems.

[20]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shoushun Chen,et al.  A second generation 3D integrated feature-extracting image sensor , 2011, 2011 IEEE SENSORS Proceedings.

[24]  S. Yalamanchili,et al.  NeuroSensor: A 3D image sensor with integrated neural accelerator , 2016, 2016 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S).

[25]  Hiroshi Takahashi,et al.  A 1/4-inch 8Mpixel back-illuminated stacked CMOS image sensor , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[26]  Hon-Sum Philip Wong,et al.  Technology and device scaling considerations for CMOS imagers , 1996 .

[27]  Steve Collins,et al.  Modeling, calibration, and correction of nonlinear illumination-dependent fixed pattern noise in logarithmic CMOS image sensors , 2002, IEEE Trans. Instrum. Meas..

[28]  Yu Cao,et al.  Exploring sub-20nm FinFET design with Predictive Technology Models , 2012, DAC Design Automation Conference 2012.

[29]  Steve Collins,et al.  Temperature Dependence of Fixed Pattern Noise in Logarithmic CMOS Image Sensors , 2007, IEEE Transactions on Instrumentation and Measurement.

[30]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.