论文信息 - Intelligent Vision Systems: Exploring the State-of-the-Art and Opportunities for the Future

Intelligent Vision Systems: Exploring the State-of-the-Art and Opportunities for the Future

Vision and video applications are becoming pervasive in mobile and embedded systems. Consumer wearable devices require capabilities for real-time video analytics and prolonged battery lifetimes, which is further driving the need for innovative system designs with low-power, reliability and high performance. Further, the increasing resolution of image sensors in these mobile systems places an increasing demand on both the memory storage as well as the computational power. Such stringent requirements have given rise to accelerator-rich architectures in system on-chips, where the primary computational burden is handled by dedicated hardware accelerators. In this paper we provide an overview of the current state-of-the-art in vision accelerators. We further discuss the opportunities to improve energy efficiency by minimizing Dynamic Random Access Memory (DRAM) refreshes and explore techniques to exploit algorithmic resilience for reduction in compute units while maintaining reliable system accuracy and performance.

Narayanan Vijaykrishnan | Siddharth Advani | Srinidhi Kestur

[1] Lizy Kurian John,et al. Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[2] Bruce Jacob,et al. Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[3] Narayanan Vijaykrishnan,et al. Accelerating neuromorphic vision algorithms for recognition , 2012, DAC Design Automation Conference 2012.

[4] Mikko H. Lipasti,et al. Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[5] Narayanan Vijaykrishnan,et al. An FPGA Implementation of Information Theoretic Visual-Saliency System and Its Optimization , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[6] Chunhong Chen,et al. A fast model for analysis and improvement of gate-level circuit reliability , 2015, Integr..

[7] John C. Russ,et al. The Image Processing Handbook , 2016, Microscopy and Microanalysis.

[8] Brad Wyble,et al. The benefit of attention is not diminished when distributed over two simultaneous cues , 2014, Attention, Perception, & Psychophysics.

[9] C. Koch,et al. Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[10] Kaushik Roy,et al. Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.

[11] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[12] Song Liu,et al. Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[13] Scott B. Baden,et al. Accelerating Viola-Jones Face Detection to FPGA-Level Using GPUs , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[14] Srihari Cadambi,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[15] Luca Benini,et al. Exploring architectural heterogeneity in intelligent vision systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[16] Ah Chung Tsoi,et al. Convolutional neural networks for face recognition , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[18] D Marr,et al. Early processing of visual information. , 1976, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[19] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[20] Richard Veras,et al. RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[21] Narayanan Vijaykrishnan,et al. An algorithm-architecture co-design framework for gridding reconstruction using FPGAs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[22] Steve B. Furber,et al. The SpiNNaker Project , 2014, Proceedings of the IEEE.

[23] James Campbell Cae,et al. Using an Embedded Vision Processor to Build an Efficient Object Recognition System , 2015 .

[24] Olivier Temam,et al. Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25] Thomas Serre,et al. Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26] R. Keys. Cubic convolution interpolation for digital image processing , 1981 .

[27] Gernot Heiser,et al. An Analysis of Power Consumption in a Smartphone , 2010, USENIX Annual Technical Conference.

[28] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[29] Narayanan Vijaykrishnan,et al. Emulating Mammalian Vision on Reconfigurable Hardware , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[30] Sanjay J. Patel,et al. Tradeoffs in designing accelerator architectures for visual computing , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[31] Vivienne Sze,et al. Energy-efficient HOG-based object detection at 1080HD 60 fps with multi-scale support , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33] Kazuaki Murakami,et al. Optimizing the DRAM refresh count for merged DRAM/logic LSIs , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[34] Moinuddin K. Qureshi,et al. A case for Refresh Pausing in DRAM memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[35] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[36] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[37] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[38] Narayanan Vijaykrishnan,et al. Refresh Enabled Video Analytics (REVA): Implications on power and performance of DRAM supported embedded visual systems , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[39] Narayanan Vijaykrishnan,et al. SHARC: A streaming model for FPGA accelerators and its application to Saliency , 2011, 2011 Design, Automation & Test in Europe.

[40] Andrew S. Cassidy,et al. Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[41] James C. Hoe,et al. Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[42] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[43] John K. Tsotsos,et al. Saliency Based on Information Maximization , 2005, NIPS.

[44] Willie Anderson,et al. Hexagon DSP: An Architecture Optimized for Mobile Multimedia and Communications , 2014, IEEE Micro.

[45] Zhen Fang,et al. CogniServe: Heterogeneous Server Architecture for Large-Scale Recognition , 2011, IEEE Micro.

[46] David G. Lowe,et al. University of British Columbia. , 1945, Canadian Medical Association journal.