FPGA Implementations of Kernel Normalised Least Mean Squares Processors

Kernel adaptive filters (KAFs) are online machine learning algorithms which are amenable to highly efficient streaming implementations. They require only a single pass through the data and can act as universal approximators, i.e. approximate any continuous function with arbitrary accuracy. KAFs are members of a family of kernel methods which apply an implicit non-linear mapping of input data to a high dimensional feature space, permitting learning algorithms to be expressed entirely as inner products. Such an approach avoids explicit projection into the feature space, enabling computational efficiency. In this paper, we propose the first fully pipelined implementation of the kernel normalised least mean squares algorithm for regression. Independent training tasks necessary for hyperparameter optimisation fill pipeline stages, so no stall cycles to resolve dependencies are required. Together with other optimisations to reduce resource utilisation and latency, our core achieves 161 GFLOPS on a Virtex 7 XC7VX485T FPGA for a floating point implementation and 211 GOPS for fixed point. Our PCI Express based floating-point system implementation achieves 80% of the core’s speed, this being a speedup of 10× over an optimised implementation on a desktop processor and 2.66× over a GPU.

[1]  Nicolas Pinto,et al.  Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[2]  Maciej Wielgosz,et al.  FPGA Implementation of 64-Bit Exponential Function for HPC , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[3]  Yoav Freund,et al.  RIFFA: A Reusable Integration Framework for FPGA Accelerators , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[4]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[5]  Robin Pottathuparambil,et al.  A parallel/vectorized double-precision exponential core to accelerate computational science applications , 2009, FPGA '09.

[6]  Paul Honeine,et al.  Online Prediction of Time Series Data With Kernels , 2009, IEEE Trans. Signal Process..

[7]  Lok-Kee Ting,et al.  High Speed FPGA-Based Implementations of Delayed-LMS Filters , 2005, J. VLSI Signal Process..

[8]  Srihari Cadambi,et al.  A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification , 2012, TACO.

[9]  Antoine Petitet,et al.  Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005 .

[10]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[11]  Scott C. Douglas,et al.  A pipelined LMS adaptive FIR filter architecture without adaptation delay , 1998, IEEE Trans. Signal Process..

[12]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[13]  Maciej Wielgosz,et al.  Highly efficient structure of 64-bit exponential function implemented in FPGAs , 2008, ARC.

[14]  Masahiro Yukawa,et al.  Multikernel Adaptive Filtering , 2012, IEEE Transactions on Signal Processing.

[15]  Nanning Zheng,et al.  Hardware implementation of KLMS algorithm using FPGA , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[16]  Bart De Moor,et al.  Hyperparameter Search in Machine Learning , 2015, ArXiv.

[17]  Paul Honeine,et al.  Online Prediction of Time Series Data With Kernels , 2009, IEEE Transactions on Signal Processing.

[18]  Alexandros Stamatakis,et al.  FPGA Optimizations for a Pipelined Floating-Point Exponential Unit , 2011, ARC.

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[20]  Florent de Dinechin,et al.  A parameterized floating-point exponential function for FPGAs , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[21]  Davide Anguita,et al.  A FPGA Core Generator for Embedded Classification Systems , 2011, J. Circuits Syst. Comput..

[22]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .

[23]  Nanning Zheng,et al.  Survival kernel with application to kernel adaptive filtering , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[24]  Ignacio Santamaría,et al.  A Sliding-Window Kernel RLS Algorithm and Its Application to Nonlinear Channel Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[26]  Christos-Savvas Bouganis,et al.  A scalable FPGA architecture for non-linear SVM training , 2008, 2008 International Conference on Field-Programmable Technology.

[27]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[28]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[29]  R.D. Poltmann,et al.  Conversion of the delayed LMS algorithm into the LMS algorithm , 1995, IEEE Signal Processing Letters.

[30]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[31]  James Hardy Wilkinson,et al.  Rounding errors in algebraic processes , 1964, IFIP Congress.

[32]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[33]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[34]  Fuyun Ling,et al.  The LMS algorithm with delayed coefficient adaptation , 1989, IEEE Trans. Acoust. Speech Signal Process..

[35]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[36]  Craig T. Jin,et al.  A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[37]  Philip Heng Wai Leong,et al.  A low latency kernel recursive least squares processor using FPGA technology , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[38]  Matthias Seeger,et al.  Relationships between Gaussian processes, Support Vector machines and Smoothing Splines , 2000 .

[39]  Philip Heng Wai Leong,et al.  Braiding: A scheme for resolving hazards in kernel adaptive filters , 2015, 2015 International Conference on Field Programmable Technology (FPT).