Training Neural Networks for classification using the Extended Kalman Filter: A comparative study

Feedforward Neural Networks training for classification problem is considered. The Extended Kalman Filter, which has been earlier used mostly for training Recurrent Neural Networks for prediction and control, is suggested as a learning algorithm. Implementation of the cross-entropy error function for mini-batch training is proposed. Popular benchmarks are used to compare the method with the gradient-descent, conjugate-gradients and the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm. The influence of mini-batch size on time and quality of training is investigated. The algorithms under consideration implemented as MATLAB scripts are available for free download.

[1]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[2]  Hermann Ney,et al.  Cross-entropy vs. squared error training: a theoretical and experimental comparison , 2013, INTERSPEECH.

[3]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[4]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[5]  Lambert Schomaker,et al.  Indoor localization by denoising autoencoders and semi-supervised learning in 3D simulated environment , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[6]  Simon Haykin,et al.  Nonlinear Bayesian Filters for Training Recurrent Neural Networks , 2008, MICAI.

[7]  S. Haykin Kalman Filtering and Neural Networks , 2001 .

[8]  Lee A. Feldkamp,et al.  Decoupled extended Kalman filter training of feedforward layered networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[9]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  Artem N. Chernodub,et al.  Direct Method for Training Feed-Forward Neural Networks Using Batch Extended Kalman Filter for Multi-Step-Ahead Predictions , 2013, ICANN.

[12]  Shuhui Li,et al.  Comparative analysis of backpropagation and extended Kalman filter in pattern and batch forms for training neural networks , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[13]  S. Haykin,et al.  Cubature Kalman Filters , 2009, IEEE Transactions on Automatic Control.

[14]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[15]  G. V. Puskorius,et al.  Training controllers for robustness: multi-stream DEKF , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[16]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[18]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[19]  Nikolay I. Nikolaev,et al.  Dynamic Modeling with Ensemble Kalman Filter Trained Recurrent Neural Networks , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[20]  Danil V. Prokhorov,et al.  Toyota Prius HEV neurocontrol and diagnostics , 2008, Neural Networks.

[21]  Peter A. Flach,et al.  LaCova: A Tree-Based Multi-label Classifier Using Label Covariance as Splitting Criterion , 2014, 2014 13th International Conference on Machine Learning and Applications.

[22]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).