Regression based D-optimality experimental design for sparse kernel density estimation

This paper derives an efficient algorithm for constructing sparse kernel density (SKD) estimates. The algorithm first selects a very small subset of significant kernels using an orthogonal forward regression (OFR) procedure based on the D-optimality experimental design criterion. The weights of the resulting sparse kernel model are then calculated using a modified multiplicative nonnegative quadratic programming algorithm. Unlike most of the SKD estimators, the proposed D-optimality regression approach is an unsupervised construction algorithm and it does not require an empirical desired response for the kernel selection task. The strength of the D-optimality OFR is owing to the fact that the algorithm automatically selects a small subset of the most significant kernels related to the largest eigenvalues of the kernel design matrix, which counts for the most energy of the kernel training data, and this also guarantees the most accurate kernel weight estimate. The proposed method is also computationally attractive, in comparison with many existing SKD construction algorithms. Extensive numerical investigation demonstrates the ability of this regression-based approach to efficiently construct a very sparse kernel density estimate with excellent test accuracy, and our results show that the proposed method compares favourably with other existing sparse methods, in terms of test accuracy, model sparsity and complexity, for constructing kernel density estimates.

[1]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[2]  Hong Wang,et al.  Robust control of the output probability density functions for multivariable stochastic systems with guaranteed stability , 1999, IEEE Trans. Autom. Control..

[3]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[4]  Christopher M. Bishop,et al.  Robust Bayesian Mixture Modelling , 2005, ESANN.

[5]  P. Deb Finite Mixture Models , 2008 .

[6]  Sheng Chen,et al.  Sparse kernel density construction using orthogonal forward regression with leave-one-out test score and local regularization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Xia Hong,et al.  Nonlinear model structure design and construction using orthogonal least squares and D-optimality design , 2002, IEEE Trans. Neural Networks.

[8]  Xia Hong,et al.  Construction of RBF Classifiers with Tunable Units using Orthogonal Forward Selection Based on Leave-One-Out Misclassification Rate , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[9]  Sheng Chen,et al.  Sparse modeling using orthogonal forward regression with PRESS statistic and regularization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Sheng Chen,et al.  An orthogonal forward regression technique for sparse kernel density estimation , 2008, Neurocomputing.

[11]  Bing Lam Luk,et al.  Construction of Tunable Radial Basis Function Networks Using Orthogonal Forward Selection , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Sheng Chen,et al.  Orthogonal Forward Selection for Constructing the Radial Basis Function Network with Tunable Nodes , 2005, ICIC.

[14]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[15]  Sayan Mukherjee,et al.  Support Vector Method for Multivariate Density Estimation , 1999, NIPS.

[16]  Daniel D. Lee,et al.  Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines , 2002, NIPS.

[18]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[19]  X. Wanga,et al.  Sparse support vector regression based on orthogonal forward selection for the generalised kernel model , 2005 .

[20]  S. Sheather Density Estimation , 2004 .

[21]  Sheng Chen,et al.  Probability Density Estimation With Tunable Kernels Using Orthogonal Forward Regression , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Sheng Chen,et al.  Robust maximum likelihood training of heteroscedastic probabilistic neural networks , 1998, Neural Networks.

[23]  George W. Irwin,et al.  A New Jacobian Matrix for Optimal Learning of Single-Layer Neural Networks , 2008, IEEE Transactions on Neural Networks.

[24]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[25]  J. Weston,et al.  Support vector density estimation , 1999 .

[26]  Le Song,et al.  Tailoring density estimation via reproducing kernel moment matching , 2008, ICML '08.

[27]  Michel Verleysen,et al.  Robust Bayesian clustering , 2007, Neural Networks.

[28]  Sheng Chen,et al.  A Forward-Constrained Regression Algorithm for Sparse Kernel Density Estimation , 2008, IEEE Transactions on Neural Networks.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  R. H. Myers Classical and modern regression with applications , 1986 .

[31]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[32]  Chris J. Harris,et al.  Neurofuzzy design and model construction of nonlinear dynamical processes from data , 2001 .

[33]  Kang Li,et al.  Two-Stage Mixed Discrete–Continuous Identification of Radial Basis Function (RBF) Neural Models for Nonlinear Systems , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[34]  W. Näther Optimum experimental designs , 1994 .

[35]  Xia Hong,et al.  Nonlinear model structure detection using optimum experimental design and orthogonal least squares , 2001, IEEE Trans. Neural Networks.

[36]  Sheng Chen,et al.  Parsimonious least squares support vector regression using orthogonal forward selection with the generalised kernel model , 2006, Int. J. Model. Identif. Control..

[37]  Lajos Hanzo,et al.  Adaptive minimum-BER linear multiuser detection for DS-CDMA signals in multipath channels , 2001, IEEE Trans. Signal Process..

[38]  A. Choudhury Fast machine learning algorithms for large data , 2002 .

[39]  S. Chen,et al.  Fast orthogonal least squares algorithm for efficient subset model selection , 1995, IEEE Trans. Signal Process..

[40]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[41]  Sheng Chen,et al.  Sparse kernel regression modeling using combined locally regularized orthogonal least squares and D-optimality experimental design , 2003, IEEE Trans. Autom. Control..

[42]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[43]  Richard D. Deveaux,et al.  Applied Smoothing Techniques for Data Analysis , 1999, Technometrics.

[44]  V. Vapnik,et al.  Multivariate Density Estimation: an SVM Approach , 1999 .

[45]  Sheng Chen,et al.  Robust nonlinear model identification methods using forward regression , 2003, IEEE Trans. Syst. Man Cybern. Part A.

[46]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .