Statistical Learning Theory and Kernel-Based Methods

The basics of kernel methods and their position in the generalized data-driven fault diagnostic framework are reviewed. The review starts out with statistical learning theory, covering concepts such as loss functions, overfitting and structural and empirical risk minimization. This is followed by linear margin classifiers, kernels and support vector machines. Transductive support vector machines are discussed and illustrated by way of an example related to multivariate image analysis of coal particles on conveyor belts. Finally, unsupervised kernel methods, such as kernel principal component analysis, are considered in detail, analogous to the application of linear principal component analysis in multivariate statistical process control. Fault diagnosis in a simulated nonlinear system by the use of kernel principal component analysis is included as an example to illustrate the concepts.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  Mark A. Kramer,et al.  Autoassociative neural networks , 1992 .

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Chris Aldrich,et al.  The classification of froth structures in a copper flotation plant by means of a neural net , 1995 .

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  Jani Kaartinen,et al.  Machine-vision-based control of zinc flotation—A case study , 2006 .

[8]  Bernhard Schölkopf,et al.  Sparse Kernel Feature Analysis , 2002 .

[9]  Chris Aldrich,et al.  Kernel-based fault diagnosis on mineral processing plants , 2006 .

[10]  William W. Hsieh,et al.  Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels , 2009 .

[11]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[12]  Chris Aldrich,et al.  Estimating size fraction categories of coal particles on conveyor belts using image texture modeling methods , 2012, Expert Syst. Appl..

[13]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[14]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[15]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[16]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[17]  J. Maindonald Statistical Learning from a Regression Perspective , 2008 .

[18]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[19]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[20]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[23]  Jayson Tessier,et al.  A machine vision approach to on-line estimation of run-of-mine ore composition on conveyor belts , 2007 .

[24]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[25]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[26]  Michael E. Tipping Sparse Kernel Principal Component Analysis , 2000, NIPS.

[27]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[28]  A. Belousov,et al.  Applicational aspects of support vector machines , 2002 .

[29]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[30]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[31]  T. McAvoy,et al.  Nonlinear principal component analysis—Based on principal curves and neural networks , 1996 .

[32]  William W. Hsieh Machine Learning Methods in the Environmental Sciences: Contents , 2009 .