HARP: Hierarchical Attention Oriented Region-Based Processing for High-Performance Computation in Vision Sensor

Cameras are widely adopted for high image quality with the rapid advancement of complementary metal-oxide-semiconductor (CMOS) image sensors while offloading vision applications’ computation to the cloud. It raises concern for time-critical applications such as autonomous driving, surveillance, and defense systems since moving pixels from the sensor’s focal plane are expensive. This paper presents a hardware architecture for smart cameras that understands the salient regions from an image frame and then performs high-level inference computation for sensor-level information creation instead of transporting raw pixels. A visual attention-oriented computational strategy helps to filter a significant amount of redundant spatiotemporal data collected at the focal plane. A computationally expensive learning model is then applied to the interesting regions of the image. The hierarchical processing in the pixels’ data path demonstrates a bottom-up architecture with massive parallelism and gives high throughput by exploiting the large bandwidth available at the image source. We prototype the model in field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) for integrating with a pixel-parallel image sensor. The experiment results show that our approach achieves significant speedup while in certain conditions exhibits up to 45% more energy efficiency with the attention-oriented processing. Although there is an area overhead for inheriting attention-oriented processing, the achieved performance based on energy consumption, latency, and memory utilization overcomes that limitation.

[1]  Fernando Pardo,et al.  Selective Change Driven Imaging: A Biomimetic Visual Sensing Strategy , 2011, Sensors.

[2]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3]  Shinya Miyata,et al.  A back-illuminated global-shutter CMOS image sensor with pixel-parallel 14b subthreshold ADC , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[4]  Stefania Perri,et al.  Energy-Efficient Architecture for CNNs Inference on Heterogeneous FPGA , 2019 .

[5]  Piotr Dudek,et al.  Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays , 2020, ECCV.

[6]  Giorgio Bonmassar,et al.  Space-variant active vision: Definition, overview and examples , 1995, Neural Networks.

[7]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[8]  Jason Cong,et al.  Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Feng Wu,et al.  Energy-efficient and high-throughput FPGA-based accelerator for Convolutional Neural Networks , 2016, 2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT).

[10]  Hiroshi Toshiyoshi,et al.  Pixel-Parallel 3-D Integrated CMOS Image Sensors With Pulse Frequency Modulation A/D Converters Developed by Direct Bonding of SOI Layers , 2015, IEEE Transactions on Electron Devices.

[11]  Piotr Dudek,et al.  Scamp5d Vision System and Development Framework , 2018, ICDSC.

[12]  Christophe Bobda,et al.  Bio-inspired smart vision sensor: toward a reconfigurable hardware modeling of the hierarchical processing in the brain , 2020, Journal of Real-Time Image Processing.

[13]  Tobias Delbrück,et al.  Frame-free dynamic digital vision , 2008 .

[14]  Chan H. See,et al.  Accelerating Retinal Fundus Image Classification Using Artificial Neural Networks (ANNs) and Reconfigurable Hardware (FPGA) , 2019 .

[15]  Jie Xu,et al.  DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Paulo Da Cunha Possa,et al.  P2IP: A novel low-latency Programmable Pipeline Image Processor , 2015, Microprocess. Microsystems.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Xinqiao Liu,et al.  A 10000 frames/s CMOS digital pixel sensor , 2001, IEEE J. Solid State Circuits.

[19]  Piotr Dudek,et al.  A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Klaus Kofler,et al.  Performance and Scalability of GPU-Based Convolutional Neural Networks , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[21]  A. Davison,et al.  Camera Tracking on Focal-Plane Sensor-Processor Arrays , 2019 .

[22]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[24]  Yu Cao,et al.  Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[26]  Wenyuan Lu,et al.  Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine , 2017, 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC).

[27]  Saibal Mukhopadhyay,et al.  Edge-Host Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[28]  Krishnendu Chakrabarty,et al.  Detection, Diagnosis, and Recovery From Clock-Domain Crossing Failures in Multiclock SoCs , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[29]  Christophe Bobda,et al.  Distributed Embedded Smart Cameras: Architectures, Design and Applications , 2014 .

[30]  Christophe Bobda,et al.  Visual Cortex Inspired Pixel-Level Re-configurable Processors for Smart Image Sensors , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[31]  Roman A. Solovyev,et al.  FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations , 2018, ArXiv.

[32]  Saibal Mukhopadhyay,et al.  A Spatiotemporal Pre-processing Network for Activity Recognition under Rain , 2019, BMVC.

[33]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[34]  Teruo Higashino,et al.  Edge-centric Computing: Vision and Challenges , 2015, CCRV.

[35]  Saibal Mukhopadhyay,et al.  Attention-Based Activation Pruning to Reduce Data Movement in Real-Time AI: A Case-Study on Local Motion Planning in Autonomous Vehicles , 2020, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[36]  Jan-Erik Eklund,et al.  VLSI implementation of a focal plane image processor-a realization of the near-sensor image processing concept , 1996, IEEE Trans. Very Large Scale Integr. Syst..

[37]  Christof Koch,et al.  Feature combination strategies for saliency-based visual attention systems , 2001, J. Electronic Imaging.