Exploring architectural heterogeneity in intelligent vision systems

Limited power budgets and the need for high performance computing have led to platform customization with a number of accelerators integrated with CMPs. In order to study customized architectures, we model four customization design points and compare their performance and energy across a number of computer vision workloads. We analyze the limitations of generic architectures and quantify the costs of increasing customization using these micro-architectural design points. This analysis leads us to develop a framework consisting of low-power multi-cores and an array of configurable micro-accelerator functional units. Using this platform, we illustrate dataflow and control processing optimizations that provide for performance gains similar to custom ASICs for a wide range of vision benchmarks.

[1]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[3]  Christopher Batten,et al.  The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[6]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[7]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[8]  Serge J. Belongie,et al.  SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Scott A. Mahlke,et al.  Polymorphic Pipeline Array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Michele Magno,et al.  Multimodal Abandoned/Removed Object Detection for Low Power Video Surveillance Systems , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[11]  Ben H. H. Juurlink,et al.  The SARC Architecture , 2010, IEEE Micro.

[12]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[13]  Vikram Bhatt,et al.  The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future , 2011, IEEE Micro.

[14]  K. Asanović,et al.  Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[15]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[16]  Jason Cong,et al.  AXR-CMP : Architecture Support in Accelerator-Rich CMPs , 2011 .

[17]  Zhen Fang,et al.  CogniServe: Heterogeneous Server Architecture for Large-Scale Recognition , 2011, IEEE Micro.

[18]  Silvio Savarese,et al.  EFFEX: An embedded processor for computer vision based feature extraction , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[19]  Narayanan Vijaykrishnan,et al.  Accelerating neuromorphic vision algorithms for recognition , 2012, DAC Design Automation Conference 2012.

[20]  Ammad Ali,et al.  Face Recognition with Local Binary Patterns , 2012 .

[21]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[22]  Luca Benini,et al.  Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications , 2012, DAC Design Automation Conference 2012.

[23]  Jason Cong,et al.  CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.

[24]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[25]  Silvio Savarese,et al.  EVA: An efficient vision architecture for mobile systems , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[26]  Koji Nii,et al.  10.2 A 28nm HPM heterogeneous multi-core mobile application processor with 2GHz cores and low-power 1GHz cores , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[27]  S. Datta,et al.  Pairwise coupled hybrid vanadium dioxide-MOSFET (HVFET) oscillators for non-boolean associative computing , 2014, 2014 IEEE International Electron Devices Meeting.

[28]  Willie Anderson,et al.  Hexagon DSP: An Architecture Optimized for Mobile Multimedia and Communications , 2014, IEEE Micro.

[29]  Narayanan Vijaykrishnan,et al.  A hardware accelerated multilevel visual classifier for embedded visual-assist systems , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[30]  Chih-Cheng Chen,et al.  10.3 heterogeneous multi-processing quad-core CPU and dual-GPU design for optimal performance, power, and thermal tradeoffs in a 28nm mobile application processor , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[31]  Christoforos E. Kozyrakis,et al.  Convolution engine , 2015, Commun. ACM.