Two-stage multiple kernel learning for supervised dimensionality reduction

In supervised dimensionality reduction methods for pattern recognition tasks, the information of the class labels is considered through the process of reducing the input dimensionality, to improve the classification accuracy. Using nonlinear mappings for this purpose makes these models more appropriate for nonlinearly distributed data. In this paper, a new nonlinear supervised dimensionality reduction model is introduced. The dimensionality reduction process in this model is performed through a multiple kernel learning paradigm in two stages. In the first stage, three suitable criteria for supervised dimensionality reduction containing fisher, homoscedasticity, and between-class distance criteria are used to find the kernel weights. With these weights, a linear combination of several valid kernels is computed to make a new suitable kernel function. In the second stage, the kernel discriminant analysis method is employed for nonlinear supervised dimensionality reduction using the kernel computed in the first stage. Many experiments on a variety of real-world datasets including handwritten digits images, objects images, and other datasets, show that the proposed approach among a number of well-known related techniques, results in accurate and fast classifications. A novel approach for non-linear supervised dimensionality reduction is proposed.The proposed method uses kernel discriminant analysis method with new kernels.A multiple kernel learning (MKL) paradigm produces the new kernel function.Proper criteria for supervised dimensionality reduction are optimized in the MKL part.The proposed method is successfully evaluated using many real-world datasets.

[1]  Chiou-Shann Fuh,et al.  Multiple Kernel Learning for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Robert P. W. Duin,et al.  Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[4]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Manik Varma,et al.  On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection , 2014, ICML.

[6]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[7]  Yuan Yan Tang,et al.  High-Order Distance-Based Multiview Stochastic Learning in Image Classification , 2014, IEEE Transactions on Cybernetics.

[8]  Mehmet Gönen,et al.  Bayesian Efficient Multiple Kernel Learning , 2012, ICML.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[11]  Yuhong Guo,et al.  Learning SVM Classifiers with Indefinite Kernels , 2012, AAAI.

[12]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[13]  Feng Zhao,et al.  Learning kernel parameters for kernel Fisher discriminant analysis , 2013, Pattern Recognit. Lett..

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Koray Kavukcuoglu,et al.  A Binary Classification Framework for Two-Stage Multiple Kernel Learning , 2012, ICML.

[16]  Subhransu Maji,et al.  Fast and Accurate Digit Classification , 2009 .

[17]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[18]  Aleix M. Martínez,et al.  Subclass discriminant analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[20]  Yiannis Kompatsiaris,et al.  Mixture Subclass Discriminant Analysis , 2011, IEEE Signal Processing Letters.

[21]  Hongping Cai,et al.  Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.

[23]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[25]  Yiannis Kompatsiaris,et al.  Mixture Subclass Discriminant Analysis Link to Restricted Gaussian Model and Other Generalizations , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[26]  G. Wahba Spline models for observational data , 1990 .

[27]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Thomas F. Coleman,et al.  A Reflective Newton Method for Minimizing a Quadratic Function Subject to Bounds on Some of the Variables , 1992, SIAM J. Optim..

[29]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[30]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[31]  Xuelong Li,et al.  Patch Alignment for Dimensionality Reduction , 2009, IEEE Transactions on Knowledge and Data Engineering.

[32]  Jun Yu,et al.  Semantic preserving distance metric learning and applications , 2014, Inf. Sci..

[33]  Robert Jenssen Mean Vector Component Analysis for Visualization and Clustering of Nonnegative Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Bernt Schiele,et al.  Extracting Structures in Image Collections for Object Recognition , 2010, ECCV.

[36]  Shan Suthaharan,et al.  Support Vector Machine , 2016 .

[37]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[38]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[39]  Boubakeur Boufama,et al.  Non-parametric Fisher's discriminant analysis with kernels for data classification , 2013, Pattern Recognit. Lett..

[40]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[41]  Aleix M. Martínez,et al.  Bayes optimal kernel discriminant analysis , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  William Stafford Noble,et al.  Support vector machine , 2013 .

[43]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[44]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Dacheng Tao,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Barbara Caputo,et al.  Multi Kernel Learning with Online-Batch Optimization , 2012, J. Mach. Learn. Res..

[47]  Zhi-Hua Zhou,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[48]  lexander,et al.  THE GENERALIZED SIMPLEX METHOD FOR MINIMIZING A LINEAR FORM UNDER LINEAR INEQUALITY RESTRAINTS , 2012 .

[49]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[50]  Aleix M. Martínez,et al.  Kernel Optimization in Discriminant Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Bernt Schiele,et al.  Analyzing contour and appearance based methods for object categorization , 2003, CVPR 2003.

[52]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[53]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[54]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[55]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[57]  Masashi Sugiyama,et al.  Local Fisher discriminant analysis for supervised dimensionality reduction , 2006, ICML.

[58]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[59]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[60]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[61]  Seokho Yoon,et al.  Complexity reduction of kernel discriminant analysis , 2012, 2012 46th Annual Conference on Information Sciences and Systems (CISS).

[62]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[63]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .