A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation

Kernel adaptive filters (KAFs) are online machine learning algorithms which are amenable to highly efficient streaming implementations. They require only a single pass through the data during training and can act as universal approximators, i.e. approximate any continuous function with arbitrary accuracy. KAFs are members of a family of kernel methods which apply an implicit nonlinear mapping of input data to a high dimensional feature space, permitting learning algorithms to be expressed entirely as inner products. Such an approach avoids explicit projection into the feature space, enabling computational efficiency. In this paper, we propose the first fully pipelined floating point implementation of the kernel normalised least mean squares algorithm for regression. Independent training tasks necessary for parameter optimisation fill L cycles of latency ensuring the pipeline does not stall. Together with other optimisations to reduce resource utilisation and latency, our core achieves 160 GFLOPS on a Virtex 7 XC7VX485T FPGA, and the PCI-based system implementation is 70× faster than an optimised software implementation on a desktop processor.

[1]  Christos-Savvas Bouganis,et al.  A scalable FPGA architecture for non-linear SVM training , 2008, 2008 International Conference on Field-Programmable Technology.

[2]  Masahiro Yukawa,et al.  Multikernel Adaptive Filtering , 2012, IEEE Transactions on Signal Processing.

[3]  Yoav Freund,et al.  RIFFA: A Reusable Integration Framework for FPGA Accelerators , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[4]  Ignacio Santamaría,et al.  A Sliding-Window Kernel RLS Algorithm and Its Application to Nonlinear Channel Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[6]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .

[7]  Srihari Cadambi,et al.  A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification , 2012, TACO.

[8]  Davide Anguita,et al.  A FPGA Core Generator for Embedded Classification Systems , 2011, J. Circuits Syst. Comput..

[9]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[10]  Philip Heng Wai Leong,et al.  A low latency kernel recursive least squares processor using FPGA technology , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[11]  Nanning Zheng,et al.  Hardware implementation of KLMS algorithm using FPGA , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[12]  Paul Honeine,et al.  Online Prediction of Time Series Data With Kernels , 2009, IEEE Transactions on Signal Processing.

[13]  Nanning Zheng,et al.  Survival kernel with application to kernel adaptive filtering , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[14]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.