Kernel based learning methods for pattern and feature analysis

Kernel-based learning methods (kernel methods) have significant influences on recent development of machine learning research. This thesis is on devising and improving kernel methods, and on applying them to pattern and feature analysis. Part of our research focuses on improving Support Vector Machines (SVMs). In solving SVMs, we find that some proposed cache policies for sequential minimal optimization (SMO) result in low efficiency. A better strategy is to cache gradients for all vectors frequently checked. Moreover, we propose a strategy that utilizes the nearest neighboring vectors to speed up the convergence of SMO. We also suggest the use of Hadamard codes for multiclass label prediction by SVMs. We prove that the Hadamard codes are optimal in correcting the wrong labels predicted by base classifiers. Furthermore, we design a newsummation of exponential (SoE) kernel for solving regression tasks with missing values. We show SoE kernels are admissible to kernel conditions and insensitive to missing values. This thesis also deals with unsupervised and semi-supervised kernel methods. Specifically, we transform the Rival Penalized Competitive Learning (RPCL) from data space to feature space for automatic clustering. In addition, we use spectral analysis of kernel matrices to address the seeds initialization problem associated with RPCL. We also improve the SVM-based feature selection in a semi-supervised manner by utilizing both labeled and unlabeled data. The new feature selection method exhibits good performance in solving feature selection benchmark problems.

[1]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[2]  Tommi S. Jaakkola,et al.  Feature Selection and Dualities in Maximum Entropy Discrimination , 2000, UAI.

[3]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[7]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[8]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[9]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[10]  Peter A. Flach,et al.  Feature Selection with Labelled and Unlabelled Data , 2002 .

[11]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[12]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[13]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Bernhard Schölkopf,et al.  Feature selection and transduction for prediction of molecular bioactivity for drug design , 2003, Bioinform..

[16]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[17]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[18]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[19]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[20]  O. Mangasarian,et al.  Semi-Supervised Support Vector Machines for Unlabeled Data Classification , 2001 .

[21]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[22]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[23]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[24]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[25]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[26]  Henry Tirri,et al.  A Statistical Modeling Approach to Location Estimation , 2002, IEEE Trans. Mob. Comput..

[27]  F. Girosi,et al.  Notes on PCA, Regularization, Sparsity and Support Vector Machines , 1998 .

[28]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[29]  Dunja Mladenic,et al.  Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[30]  Yiu-ming Cheung Expectation-MiniMax: A General Penalized Competitive Learning Approach to Clustering Analysis , 2003 .

[31]  Kai-Tai Fang,et al.  On Hadamard-Type Output Coding in Multiclass Learning , 2003, IDEAL.

[32]  Xuegong Zhang,et al.  Kernel Nearest-Neighbor Algorithm , 2002, Neural Processing Letters.

[33]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[34]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[35]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[36]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[37]  Rong Zhang,et al.  A large scale clustering scheme for kernel K-Means , 2002, Object recognition supported by user interaction for service robots.

[38]  Emilio Corchado,et al.  Advances in Self-Organizing Maps , 2006, Neural Networks.

[39]  Winton Afrić A Method for Implementing Mobile Station Location in GSM , 2001 .

[40]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[41]  Narasimha M Murty,et al.  Kernel Enabled K-Means Algorithm , 2002 .

[42]  Isabelle Guyon,et al.  Multivariate Non-Linear Feature Selection with Kernel Multiplicative Updates and Gram-Schmidt Relief , 2003 .

[43]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[44]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[45]  Joseph Kee-Yin Ng,et al.  Location Estimation via Missing Value Support Vector Regression , 2003 .

[46]  Zhi-Li Wu,et al.  On improving sequential minimal optimization , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[47]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[48]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[49]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[50]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[51]  John Platt,et al.  Fast training of svms using sequential minimal optimization , 1998 .

[52]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[53]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[54]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[55]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[56]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[57]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[58]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[59]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[60]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[61]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[62]  Marko Grobelnik,et al.  Feature Selection Using Linear Support Vector Machines , 2002 .

[63]  Yu-ming Cheung Rival penalization controlled competitive learning for data clustering with unknown cluster number , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[64]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[65]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[66]  Yiu-ming Cheung,et al.  Color image segmentation using rival penalized controlled competitive learning , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[67]  Jian-xiong Dong,et al.  A Fast SVM Training Algorithm , 2003, Int. J. Pattern Recognit. Artif. Intell..

[68]  Zhi-Li Wu,et al.  A Kernel Enabled RPCL Algorithm , 2004 .

[69]  Thore Graepel,et al.  Kernel Matrix Completion by Semidefinite Programming , 2002, ICANN.

[70]  Hongyuan Zha,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002 .

[71]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[72]  Christopher K. I. Williams Learning Kernel Classifiers , 2003 .

[73]  Joseph Kee-Yin Ng,et al.  A dual-channel location estimation system for providing location services based on the GPS and GSM networks , 2003, 17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003..

[74]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[75]  Chun-hung Li,et al.  Spectral Energy Minimization for Semi-supervised Learning , 2004, PAKDD.

[76]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[77]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[78]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[79]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[80]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[81]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..