A low latency kernel recursive least squares processor using FPGA technology

The kernel recursive least squares (KRLS) algorithm performs non-linear regression in an online manner, with similar computational requirements to linear techniques. In this paper, an implementation of the KRLS algorithm utilising pipelining and vectorisation for performance; and microcoding for reusability is described. The design can be scaled to allow tradeoffs between capacity, performance and area. Compared with a central processing unit (CPU) and digital signal processor (DSP), the processor improves on execution time, latency and energy consumption by factors of 5, 5 and 12 respectively.

[1]  Jonathan Rose,et al.  Portable, Flexible, and Scalable Soft Vector Processors , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Theo J. A. de Vries,et al.  Pruning error minimization in least squares support vector machines , 2003, IEEE Trans. Neural Networks.

[3]  Davide Anguita,et al.  A FPGA Core Generator for Embedded Classification Systems , 2011, J. Circuits Syst. Comput..

[4]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[5]  Guy Lemieux,et al.  VENICE: A compact vector processor for FPGA applications , 2012, 2012 International Conference on Field-Programmable Technology.

[6]  Christos-Savvas Bouganis,et al.  A scalable FPGA architecture for non-linear SVM training , 2008, 2008 International Conference on Field-Programmable Technology.

[7]  Alessandro Forin,et al.  Direct GPU/FPGA communication Via PCI express , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[8]  Miriam Leeser,et al.  An Autonomous Vector/Scalar Floating Point Coprocessor for FPGAs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[9]  Ignacio Santamaría,et al.  A Sliding-Window Kernel RLS Algorithm and Its Application to Nonlinear Channel Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  John Wawrzynek,et al.  High-throughput bayesian computing machine with reconfigurable hardware , 2010, FPGA '10.

[11]  Weifeng Liu,et al.  Fixed-budget kernel recursive least-squares , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Antoine Petitet,et al.  Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005 .

[13]  Jerome H. Friedman,et al.  Recent Advances in Predictive (Machine) Learning , 2006, J. Classif..

[14]  Davide Anguita,et al.  A Hardware-friendly Support Vector Machine for Embedded Automotive Applications , 2007, 2007 International Joint Conference on Neural Networks.

[15]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[16]  R. C. Whaley,et al.  Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005, Softw. Pract. Exp..

[17]  Guy Lemieux,et al.  VEGAS: soft vector processor with scratchpad memory , 2011, FPGA '11.

[18]  Yu Wang,et al.  FPMR: MapReduce framework on FPGA , 2010, FPGA '10.

[19]  Xiaohong Jiang,et al.  Generalized Two-Hop Relay for Flexible Delay Control in MANETs , 2012, IEEE/ACM Transactions on Networking.

[20]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[21]  John W. Lockwood,et al.  A Low-Latency Library in FPGA Hardware for High-Frequency Trading (HFT) , 2012, 2012 IEEE 20th Annual Symposium on High-Performance Interconnects.

[22]  Guy Lemieux,et al.  Vector Processing as a Soft Processor Accelerator , 2009, TRETS.

[23]  Srihari Cadambi,et al.  A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification , 2012, TACO.

[24]  Weifeng Liu,et al.  An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters , 2009, IEEE Transactions on Neural Networks.

[25]  Daming Lin,et al.  A review on machinery diagnostics and prognostics implementing condition-based maintenance , 2006 .

[26]  Daniel Le Ly A High-performance, Reconfigurable Architecture for Restricted Boltzmann Machines , 2010 .

[27]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[28]  Yuri V. Makarov,et al.  Blackout Prevention in the United States, Europe, and Russia , 2005, Proceedings of the IEEE.

[29]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.