Embedded feature selection accounting for unknown data heterogeneity

Abstract Data heterogeneity is one of the big challenges in modern data analysis caused by the effects of unknown/unwanted factors introduced during data collection procedures. It will cause spurious estimation of variable effects when traditional methods are applied for feature selection which simply assume that data samples are independently and identically distributed. Although some existing statistical models can evaluate more accurately the significance of each variable by estimating and including unknown factors as covariates, they are categorized as filter methods suffering from variable redundancy and lack of predictability. Therefore, we propose an embedded feature selection method from a sparse learning perspective capable of adjusting unknown heterogeneity. Its performance is investigated by evaluating the classification performance using the selected features in multi-class classification problems. Benefitting from the effective adjustment of unknown heterogeneity and model selection strategy, the experimental results on synthetic data and three real-world benchmark data sets have shown that our method can achieve consistent superiority over several conventional embedded methods and existing statistical models.

[1]  Dan Schonfeld Optimal Structuring Elements for the Morphological Pattern Restoration of Binary Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  Yalda Mohsenzadeh,et al.  Incremental relevance sample-feature machine: A fast marginal likelihood maximization approach for joint feature selection and classification , 2016, Pattern Recognit..

[4]  Feiping Nie,et al.  Effective Discriminative Feature Selection With Nontrivial Solution , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[5]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[6]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Songcan Chen,et al.  A unified algorithm for mixed $$l_{2,p}$$l2,p-minimizations and its application in feature selection , 2014, Comput. Optim. Appl..

[9]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[10]  Liang Du,et al.  Unsupervised Feature Selection with Adaptive Structure Learning , 2015, KDD.

[11]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[12]  Yudong D. He,et al.  Effects of atmospheric ozone on microarray data quality. , 2003, Analytical chemistry.

[13]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[14]  John Quackenbush,et al.  Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories , 2008, BMC Genomics.

[15]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[16]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[17]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[19]  Lawrence Carin,et al.  A Bayesian approach to joint feature selection and classifier design , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Antonio M. López,et al.  A reduced feature set for driver head pose estimation , 2016, Appl. Soft Comput..

[21]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[22]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[23]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[24]  David P. Wipf,et al.  Understanding and Evaluating Sparse Linear Discriminant Analysis , 2015, AISTATS.

[25]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[26]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[27]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[28]  Yalda Mohsenzadeh,et al.  The Relevance Sample-Feature Machine: A Sparse Bayesian Learning Approach to Joint Feature-Sample Selection , 2013, IEEE Transactions on Cybernetics.

[29]  Chenlei Leng,et al.  Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data , 2008, Comput. Biol. Chem..

[30]  Feiping Nie,et al.  Discriminative Least Squares Regression for Multiclass Classification and Feature Selection , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Robert Tibshirani,et al.  STANDARDIZATION AND THE GROUP LASSO PENALTY. , 2012, Statistica Sinica.

[32]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.