Markov blanket: Efficient strategy for feature subset selection method for high dimensional microarray cancer datasets

In this paper, we discuss the importance of feature subset selection methods in machine learning techniques. An analysis of microarray expression was used to check whether global biological differences underlie common pathological features for different types of cancer datasets and to identify genes that might anticipate the clinical behavior of this disease. One way of finding relevant gene selection is by using Bayesian network based on Markov blanket. We present and compare the performance of the different approaches of features (genes) subset selection methods based on Wrapper and Markov Blanket models for the five-microarray cancer datasets. The first alternative depends on Memetic algorithms (MAs) for feature selection method. In the second alternative, we use MRMR (Minimum Redundant Maximum Relevant) for feature subset selection method hybridized by genetic search optimization techniques. We compare the performance of Markov blanket model with most common classification algorithms for those set of features. The results show that the performance measures of classification algorithms based on Markov Blanket model mostly offer better accuracy rates than other types of classical classification algorithms for the cancer Microarray datasets.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[3]  Chee Keong Kwoh,et al.  A Feature Subset Selection Method Based On High-Dimensional Mutual Information , 2011, Entropy.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..

[6]  Shunkai Fu,et al.  Markov Blanket based Feature Selection: A Review of Past Decade , 2010 .

[7]  Rema Padman,et al.  Tabu Search Enhanced Markov Blanket Classifier for High Dimensional Data Sets , 2005 .

[8]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Oleg Okun Feature Selection and Ensemble Methods for Bioinformatics: Algorithmic Classification and Implementations , 2011 .

[10]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[11]  Hao Wang,et al.  Markov Blanket Feature Selection with Non-faithful Data Distributions , 2013, 2013 IEEE 13th International Conference on Data Mining.

[12]  Singh Vijendra,et al.  Feature Selection Using Classifier in High Dimensional Data , 2014, ArXiv.

[13]  Muhammad Ejazuddin Syed,et al.  Attribute weighting in k-nearest neighbor classification , 2014 .

[14]  Yishi Zhang,et al.  An Improved IAMB Algorithm for Markov Blanket Discovery , 2010, J. Comput..

[15]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[16]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[17]  Sandrine Dudoit,et al.  Classification in microarray experiments , 2003 .

[18]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[19]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[20]  Surajit Ray,et al.  Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction , 2011, BMC Bioinformatics.

[21]  Pablo Moscato,et al.  A Gentle Introduction to Memetic Algorithms , 2003, Handbook of Metaheuristics.

[22]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[23]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[24]  Constantin F. Aliferis,et al.  A gentle introduction to support vector machines in biomedicine: Volume 1: Theory and methods , 2011 .

[25]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[26]  Nils J. Nilsson,et al.  Artificial intelligence: A modern approach: Stuart Russell and Peter Norvig, (Prentice Hall, Englewood Cliffs, NJ, 1995); xxviii + 932 pages , 1996 .

[27]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[29]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[30]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[31]  Yuan Tan,et al.  Feature selection and prediction with a Markov blanket structure learning algorithm , 2013, BMC Bioinformatics.

[32]  R. Sabourin,et al.  Feature subset selection using genetic algorithms for handwritten digit recognition , 2001, Proceedings XIV Brazilian Symposium on Computer Graphics and Image Processing.

[33]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[34]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[35]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[36]  Zexuan Zhu,et al.  Memetic Algorithms for Feature Selection on Microarray Data , 2007, ISNN.

[37]  A. Brazma,et al.  Gene expression data analysis. , 2001, FEBS letters.

[38]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[39]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[40]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[41]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .

[42]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[43]  Constantin F. Aliferis,et al.  GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data , 2005, Int. J. Medical Informatics.

[44]  C. S. Rai,et al.  Feature selection for face recognition: a memetic algorithmic approach , 2009 .

[45]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[46]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[47]  George C. Runger,et al.  Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[48]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[49]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[50]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[51]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[52]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[53]  Eva Volna Introduction to Soft Computing , 2013 .