Multilabel Feature Extraction Algorithm via Maximizing Approximated and Symmetrized Normalized Cross-Covariance Operator

Multilabel feature extraction (FE) is an effective preprocessing step to cope with some possible irrelevant, redundant, and noisy features, to reduce computational costs and even improve classification performance. Original normalized cross-covariance operator represents a kernel-based nonlinear dependence measure between features and labels, whose empirical estimator is formulated as a trace operation including two inverse matrices of feature and label kernels with a regularization constant. Due to such a complicated expression, it is impossible to derive an eigenvalue problem for linear FE directly. In this paper, we approximate this measure using Moore-Penrose inverse matrix, linear kernel for feature space, and delta kernel for label space, and then symmetrize the entire matrix in the trace operation, resulting in an effective approximated and symmetrized representation. According to orthonormal projection direction constraints, maximizing such a modified form induces a novel eigenvalue problem for multilabel linear FE. Experiments on 12 data sets illustrate that our proposed method works the best, compared with seven existing FE techniques, according to eight multilabel classification performance metrics and three statistical tests.

[1]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[2]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[3]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[4]  Michel Verleysen,et al.  Unsupervised dimensionality reduction: Overview and recent advances , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[5]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[6]  Alan Julian Izenman,et al.  Introduction to manifold learning , 2012 .

[7]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[8]  J. C. A. Barata,et al.  The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory , 2011, 1110.6882.

[9]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[10]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  T. Greville Note on the Generalized Inverse of a Matrix Product , 1966 .

[12]  Jianhua Xu,et al.  A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously , 2016, Knowl. Based Syst..

[13]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[14]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[15]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[16]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[17]  Jianhua Xu,et al.  Multi-label core vector machine with a zero label , 2014, Pattern Recognit..

[18]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[19]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..

[20]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[21]  Chris H. Q. Ding,et al.  Multi-label Linear Discriminant Analysis , 2010, ECCV.

[22]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[23]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[24]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[25]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[26]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[27]  Xian-Da Zhang,et al.  Matrix Analysis and Applications , 2017 .

[28]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[29]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[30]  Xue-wen Chen,et al.  Mr.KNN: soft relevance for multi-label classification , 2010, CIKM.

[31]  Qiang Yang,et al.  Document Transformation for Multi-label Feature Selection in Text Categorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[32]  Jianhua Xu,et al.  Fast multi-label core vector machine , 2013, Pattern Recognit..

[33]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[34]  Francisco Charte,et al.  Multilabel Classification: Problem Analysis, Metrics and Techniques , 2016 .

[35]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[36]  Masashi Sugiyama,et al.  High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso , 2012, Neural Computation.

[37]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[38]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[39]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[40]  Jianhua Xu,et al.  A weighted linear discriminant analysis framework for multi-label feature extraction , 2018, Neurocomputing.

[41]  Sebastián Ventura,et al.  Multi‐label learning: a review of the state of the art and ongoing research , 2014, WIREs Data Mining Knowl. Discov..

[42]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Lei Cao,et al.  A label compression coding approach through maximizing dependence between features and labels for multi-label classification , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[44]  Liang Sun,et al.  Multi-Label Dimensionality Reduction , 2013 .

[45]  Hans-Peter Kriegel,et al.  Multi-Output Regularized Feature Projection , 2006, IEEE Transactions on Knowledge and Data Engineering.

[46]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[47]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[48]  Krishnakumar Balasubramanian,et al.  The Landmark Selection Method for Multiple Output Prediction , 2012, ICML.

[49]  Qingming Huang,et al.  Joint Feature Selection and Classification for Multilabel Learning , 2018, IEEE Transactions on Cybernetics.

[50]  Fernando De la Torre,et al.  A Least-Squares Framework for Component Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[52]  Francisco Charte,et al.  A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines , 2018, Inf. Fusion.

[53]  Cheong Hee Park,et al.  On applying linear discriminant analysis for multi-labeled problems , 2008, Pattern Recognit. Lett..

[54]  Masashi Sugiyama,et al.  Cross-Domain Object Matching with Model Selection , 2011, AISTATS.

[55]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[56]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[57]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[58]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[59]  Francisco Charte,et al.  LI-MLC: A Label Inference Methodology for Addressing High Dimensionality in the Label Space for Multilabel Classification , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[60]  Xindong Wu,et al.  Compressed labeling on distilled labelsets for multi-label learning , 2012, Machine Learning.

[61]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[62]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[63]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.