A Reconfigurable Low Power High Throughput Streaming Architecture for Big Data Processing

General purpose computing systems are used for a large variety of applications. Extensive supports for flexibility in these systems limit their energy efficiencies. Given that big data applications are among the main emerging workloads for computing systems, specialized architectures for big data processing are needed to enable low power and high throughput execution. Several big data applications are particularly focused on classification and clustering tasks. In this paper we propose a multicore heterogeneous architecture for big data processing. This system has the capability to process key machine learning algorithms such as deep neural network, autoencoder, and k-means clustering. Memristor crossbars are utilized to provide low power high throughput execution of neural networks. The system has both training and recognition (evaluation of new input) capabilities. The proposed system could be used for classification, unsupervised clustering, dimensionality reduction, feature extraction, and anomaly detection applications. The system level area and power benefits of the specialized architecture is compared with the NVIDIA Telsa K20 GPGPU. Our experimental evaluations show that the proposed architecture can provide four to six orders of magnitude more energy efficiency over GPGPUs for big data processing.

[1]  Johannes Gorner,et al.  An energy efficient multi-bit TSV transmitter using capacitive coupling , 2014, 2014 21st IEEE International Conference on Electronics, Circuits and Systems (ICECS).

[2]  Bernabé Linares-Barranco,et al.  On Spike-Timing-Dependent-Plasticity, Memristive Devices, and Building a Self-Learning Visual Cortex , 2011, Front. Neurosci..

[3]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[5]  Hao Jiang,et al.  A heterogeneous computing system with memristor-based neuromorphic accelerators , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[6]  Dharmendra S. Modha,et al.  A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[7]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[8]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[10]  Luis Ceze,et al.  General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[11]  Zheng Li,et al.  Continuous real-world inputs can open up alternative accelerator designs , 2013, ISCA.

[12]  Chris Yakopcic,et al.  Exploring the design space of specialized multicore neural processors , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[13]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[14]  Kaushik Roy,et al.  Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[16]  Witold Pedrycz,et al.  Contrastive divergence for memristor-based restricted Boltzmann machine , 2015, Engineering applications of artificial intelligence.

[17]  Steve Furber,et al.  High-performance computing for systems of spiking neurons , 2006 .

[18]  Andrew S. Cassidy,et al.  Building block of a programmable neuromorphic substrate: A digital neurosynaptic core , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[19]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[20]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[21]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[22]  Avinoam Kolodny,et al.  Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Jacques-Olivier Klein,et al.  Robust neural logic block (NLB) based on memristor crossbar array , 2011, 2011 IEEE/ACM International Symposium on Nanoscale Architectures.

[24]  Farnood Merrikh-Bayat,et al.  Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.

[25]  Kaushik Roy,et al.  Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.

[26]  L. Chua Memristor-The missing circuit element , 1971 .

[27]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[28]  Yiran Chen,et al.  Memristor Crossbar-Based Neuromorphic Computing System: A Case Study , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Hao Jiang,et al.  RENO: A high-efficient reconfigurable neuromorphic computing accelerator design , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[30]  Chris Yakopcic,et al.  Memristor SPICE model and crossbar simulation based on devices with nanosecond switching time , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[31]  Shimeng Yu,et al.  Investigating the switching dynamics and multilevel capability of bipolar metal oxide resistive switching memory , 2011 .

[32]  Fabien Alibart,et al.  Pattern classification by memristive crossbar circuits using ex situ and in situ training , 2013, Nature Communications.

[33]  Wei Lu,et al.  Two-terminal resistive switches (memristors) for memory and logic applications , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[34]  Yu Wang,et al.  Training itself: Mixed-signal training acceleration for memristor-based neural network , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[35]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[36]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  Johannes Schemmel,et al.  Wafer-scale integration of analog neural networks , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).