On parallel online learning for adaptive embedded systems

This chapter considers parallel implementation of the online multi-label regularized least-squares machinelearning algorithm for embedded hardware platforms. The authors focus on the following properties required in real-time adaptive systems: learning in online fashion, that is, the model improves with new data but does not require storing it; the method can fully utilize the computational abilities of modern embedded multi-core computer architectures; and the system efficiently learns to predict several labels simultaneously. They demonstrate on a hand-written digit recognition task that the online algorithm converges faster, with respect to the amount of training data processed, to an accurate solution than a stochastic gradient descent based baseline. Further, the authors show that our parallelization of the method scales well on a quad-core platform. Moreover, since Network-on-Chip (NoC) has been proposed as a promising candidate for future multi-core architectures, they implement a NoC system consisting of 16 cores. The proposed machine learning algorithm is evaluated in the NoC platform. Experimental results show that, by optimizing the cache behaviour of the program, cache/memory efficiency can improve significantly. Results from the chapter provide a guideline for designing future embedded multicore machine learning devices.

[1]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[2]  François Poulet,et al.  Speed Up SVM Algorithm for Massive Classification Tasks , 2008, ADMA.

[3]  Jun Cheng,et al.  A Wearable Smartphone-Based Platform for Real-Time Cardiovascular Disease Detection Via Electrocardiogram Processing , 2010, IEEE Transactions on Information Technology in Biomedicine.

[4]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[5]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[6]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[7]  Allen C. Cheng,et al.  Machine learning on-a-chip: A high-performance low-power reusable neuron architecture for artificial neural networks in ECG classifications , 2012, Comput. Biol. Medicine.

[8]  Kanad Ghose,et al.  Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[9]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[10]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[11]  William J. Dally,et al.  Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.

[12]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[13]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[14]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[15]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[16]  Yann LeCun,et al.  An FPGA-based stream processor for embedded real-time vision with Convolutional Networks , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[17]  Theodore R. Bashkow,et al.  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[18]  R. Plackett Some theorems in least squares. , 1950, Biometrika.

[19]  Erick A. R. Swere Machine learning in embedded systems , 2008 .

[20]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[21]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[22]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[23]  Juha Plosila,et al.  Current Challenges in Embedded Communication Systems , 2010, Int. J. Embed. Real Time Commun. Syst..

[24]  Yuri Kalnishkan,et al.  An Identity for Kernel Ridge Regression , 2010, ALT.

[25]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[26]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[27]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.