Spectral feature selection for mining ultrahigh dimensional data

The rapid advance of computer-based high-throughput technology and the ubiquitous use of the web have provided unparalleled opportunities for humans to expand their capabilities in production, services, communications, and research. In this process, immense quantities of high-dimensional data are accumulated, challenging the state-of-the-art machine learning techniques to efficiently produce useful results. Feature selection can effectively reduce data dimensionality by removing irrelevant and redundant features. It brings the immediate effects of speeding up data mining algorithms, and improving mining performance such as predictive accuracy and result comprehensibility. In the last decade, a large amount of relevance criteria have been developed to evaluate the utility of features in feature selection, and these criteria are largely studied separately according to the type of learning: supervised or unsupervised. This dissertation studies spectral feature selection, which is a novel general feature selection framework based on graph spectral analysis. It unifies both supervised and unsupervised feature selection, and can generate families of algorithms for both learning contexts. It also includes many existing algorithms as its special cases and allows their joint study to gain insights. A common issue of many existing algorithms is that they are univariate method, and thus cannot handle redundant features. The proposed spectral feature selection framework can be readily extended to conduct multivariate analysis for addressing the limitation effectively. One of the most challenging problems in feature selection research is the small sample problem, in which the lack of information further worsens the situation of high dimensionality. Spectral feature selection also provides a natural way to include domain knowledge from multiple sources to enrich information and address this problem. The resulted multi-source feature selection technique represents one of the latest development trends in feature selection research. Extensive experimental study is conducted and results demonstrate that spectral feature selection achieves superior performance in various learning contexts.

[1]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[3]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[5]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[6]  Juyang Weng,et al.  Efficient content-based image retrieval using automatic feature selection , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[7]  Edward R. Dougherty,et al.  What should be expected from feature selection in small-sample settings , 2006, Bioinform..

[8]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  B. Frey,et al.  Using expression profiling data to identify human microRNA targets , 2007, Nature Methods.

[11]  Baldomero Oliva,et al.  Predicting cancer involvement of genes from heterogeneous data , 2008, BMC Bioinformatics.

[12]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[13]  Rong Jin,et al.  A Novel Method Incorporating Gene Ontology Information for Unsupervised Clustering and Feature Selection , 2008, PloS one.

[14]  Robert Nadon,et al.  Comparison of small n statistical tests of differential expression applied to microarrays , 2009, BMC Bioinformatics.

[15]  Jian Tang,et al.  Gene Ontology Driven Feature Selection from Microarray Gene Expression Data , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  R. Radke A Matlab implementation of the Implicitly Restarted Arnoldi Method for solving large-scale eigenvalue problems , 1996 .

[18]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[20]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[21]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[22]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[23]  Didier Sornette,et al.  Encyclopedia of Complexity and Systems Science , 2009 .

[24]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[25]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[26]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[27]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[28]  Fumiaki Katagiri,et al.  Overview of mRNA Expression Profiling Using DNA Microarrays , 2009, Current protocols in molecular biology.

[29]  Jieping Ye,et al.  A least squares formulation for a class of generalized eigenvalue problems in machine learning , 2009, ICML '09.

[30]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Jieping Ye,et al.  Identifying biologically relevant genes via multiple heterogeneous data sources , 2008, KDD.

[33]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[34]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[35]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[36]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[37]  Tony Jebara,et al.  Structure preserving embedding , 2009, ICML '09.

[38]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[39]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[40]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[41]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[42]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[44]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[45]  Huan Liu,et al.  An Integrative Approach to Indentifying Biologically Relevant Genes , 2010, SDM.

[46]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[47]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[48]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[49]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[50]  E.J. Delp,et al.  A Comparison of Feature Selection Methods for the Detection of Breast Cancers in Mammograms: Adaptive Sequential Floating Search vs. Genetic Algorithm , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[51]  Jieping Ye,et al.  Developmental stage annotation of Drosophila gene expression pattern images via an entire solution path for LDA , 2008, TKDD.

[52]  Kilian Q. Weinberger,et al.  Spectral Methods for Dimensionality Reduction , 2006, Semi-Supervised Learning.

[53]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[56]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[57]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[58]  Edwin R. Hancock,et al.  Spectral embedding of graphs , 2003, Pattern Recognit..

[59]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[60]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[61]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[62]  Evgeniy Gabrilovich,et al.  Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5 , 2004, ICML.

[63]  C. Ding,et al.  Gene selection algorithm by combining reliefF and mRMR , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[64]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[65]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[66]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[67]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[68]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[69]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[70]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[71]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[72]  Tong Zhang,et al.  Analysis of Spectral Kernel Design based Semi-supervised Learning , 2005, NIPS.

[73]  Gavin C. Cawley,et al.  Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation , 2006, NIPS.

[74]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[75]  Chris H. Q. Ding,et al.  Stable feature selection via dense feature groups , 2008, KDD.

[76]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[77]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[78]  Michelangelo Ceci,et al.  Redundant feature elimination for multi-class problems , 2004, ICML.

[79]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[80]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[81]  Michael R. Lyu,et al.  Direct Zero-Norm Optimization for Feature Selection , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[82]  Shuangge Ma BMC Bioinformatics BioMed Central Methodology article Empirical study of supervised gene screening , 2006 .

[83]  Terry Windeatt,et al.  Relevant and Redundant Feature Analysis with Ensemble Classification , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[84]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[85]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[86]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[87]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[88]  Sanjay Shete,et al.  Finding factors influencing risk: Comparing Bayesian stochastic search and standard variable selection methods applied to logistic regression models of cases and controls , 2008, Statistics in medicine.

[89]  May D. Wang,et al.  Improving the Efficiency of Biomarker Identification Using Biological Knowledge , 2008, Pacific Symposium on Biocomputing.

[90]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[91]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[92]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[93]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[94]  Young Bun Kim,et al.  Unsupervised Gene Selection For High Dimensional Data , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[95]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[96]  L. Triplett,et al.  Skin tumor-promoting activity of benzoyl peroxide, a widely used free radical-generating compound. , 1981, Science.

[97]  Huan Liu,et al.  Multi-Source Feature Selection via Geometry-Dependent Covariance Analysis , 2008, FSDM.

[98]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[99]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[100]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[101]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[102]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[103]  G. Obozinski,et al.  High-dimensional union support recovery in multivariate regression , 2008 .

[104]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[105]  Marcel J. T. Reinders,et al.  A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets , 2006, BMC Bioinformatics.

[106]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[107]  F B Kraemer,et al.  Aberrations in normal systemic lipid metabolism in ovarian cancer patients. , 1996, Gynecologic oncology.