Basis vector orthogonalization for an improved kernel gradient matching pursuit method

With the aim of achieving a computationally efficient optimization of kernel-based probabilistic models for various problems, such as sequential pattern recognition, we have already developed the kernel gradient matching pursuit method as an approximation technique for kernel-based classification. The conventional kernel gradient matching pursuit method approximates the optimal parameter vector by using a linear combination of a small number of basis vectors. In this paper, we propose an improved kernel gradient matching pursuit method that introduces orthogonality constraints to the obtained basis vector set. We verified the efficiency of the proposed method by conducting recognition experiments based on handwritten image datasets and speech datasets. We realized a scalable kernel optimization that incorporated various models, handled very high-dimensional features (>;100 K features), and enabled the use of large scale datasets (>; 10 M samples).

[1]  Hermann Ney,et al.  Subspace pursuit method for kernel-log-linear models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[3]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[4]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[5]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Steve J. Young,et al.  MMI training for continuous phoneme recognition on the TIMIT database , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[9]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine-mediated learning.

[10]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[11]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.