论文信息 - On parallel online learning for adaptive embedded systems

On parallel online learning for adaptive embedded systems

This chapter considers parallel implementation of the online multi-label regularized least-squares machinelearning algorithm for embedded hardware platforms. The authors focus on the following properties required in real-time adaptive systems: learning in online fashion, that is, the model improves with new data but does not require storing it; the method can fully utilize the computational abilities of modern embedded multi-core computer architectures; and the system efficiently learns to predict several labels simultaneously. They demonstrate on a hand-written digit recognition task that the online algorithm converges faster, with respect to the amount of training data processed, to an accurate solution than a stochastic gradient descent based baseline. Further, the authors show that our parallelization of the method scales well on a quad-core platform. Moreover, since Network-on-Chip (NoC) has been proposed as a promising candidate for future multi-core architectures, they implement a NoC system consisting of 16 cores. The proposed machine learning algorithm is evaluated in the NoC platform. Experimental results show that, by optimizing the cache behaviour of the program, cache/memory efficiency can improve significantly. Results from the chapter provide a guideline for designing future embedded multicore machine learning devices.

[1] Toby Sharp,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[2] François Poulet,et al. Speed Up SVM Algorithm for Massive Classification Tasks , 2008, ADMA.

[3] Jun Cheng,et al. A Wearable Smartphone-Based Platform for Real-Time Cardiovascular Disease Detection Via Electrocardiogram Processing , 2010, IEEE Transactions on Information Technology in Biomedicine.

[4] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[5] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.

[6] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[7] Allen C. Cheng,et al. Machine learning on-a-chip: A high-performance low-power reusable neuron architecture for artificial neural networks in ECG classifications , 2012, Comput. Biol. Medicine.

[8] Kanad Ghose,et al. Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[9] S. R. Searle,et al. On Deriving the Inverse of a Sum of Matrices , 1981 .

[10] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.

[11] William J. Dally,et al. Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.

[12] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[13] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[14] T. Poggio,et al. The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[15] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[16] Yann LeCun,et al. An FPGA-based stream processor for embedded real-time vision with Convolutional Networks , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[17] Theodore R. Bashkow,et al. A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[18] R. Plackett. Some theorems in least squares. , 1950, Biometrika.

[19] Erick A. R. Swere. Machine learning in embedded systems , 2008 .

[20] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[21] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[22] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[23] Juha Plosila,et al. Current Challenges in Embedded Communication Systems , 2010, Int. J. Embed. Real Time Commun. Syst..

[24] Yuri Kalnishkan,et al. An Identity for Kernel Ridge Regression , 2010, ALT.

[25] Jing Peng,et al. SVM vs regularized least squares classification , 2004, ICPR 2004.

[26] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[27] John Langford,et al. Multi-Label Prediction via Compressed Sensing , 2009, NIPS.