Biomarker Selection and Classification of “-Omics” Data Using a Two-Step Bayes Classification Framework

Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly. Here, a novel two-step machine-learning framework is presented to address this need. First, a Naïve Bayes estimator is used to rank features from which the top-ranked will most likely contain the most informative features for prediction of the underlying biological classes. The top-ranked features are then used in a Hidden Naïve Bayes classifier to construct a classification prediction model from these filtered attributes. In order to obtain the minimum set of the most informative biomarkers, the bottom-ranked features are successively removed from the Naïve Bayes-filtered feature list one at a time, and the classification accuracy of the Hidden Naïve Bayes classifier is checked for each pruned feature set. The performance of the proposed two-step Bayes classification framework was tested on different types of -omics datasets including gene expression microarray, single nucleotide polymorphism microarray (SNParray), and surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) proteomic data. The proposed two-step Bayes classification framework was equal to and, in some cases, outperformed other classification methods in terms of prediction accuracy, minimum number of classification markers, and computational time.

[1]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Yuanyuan Ding,et al.  Improving the Performance of SVM-RFE to Select Genes in Microarray Data , 2006, BMC Bioinformatics.

[4]  Jagath C. Rajapakse,et al.  SVM-RFE peak selection for cancer classification with mass spectrometry data , 2005, APBC.

[5]  Pedro Larrañaga,et al.  Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS , 2005, J. Biomed. Informatics.

[6]  Peter Kokol,et al.  Finding optimal classifiers for small feature sets in genomics and proteomics , 2010, Neurocomputing.

[7]  Nickolas Savarimuthu,et al.  SVM ranking with backward search for feature selection in type II diabetes databases , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[8]  Azadeh Mohammadi,et al.  Identification of disease-causing genes using microarray data mining and Gene Ontology , 2011, BMC Medical Genomics.

[9]  Manolis Tsiknakis,et al.  Maturation of a central , 1996 .

[10]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[11]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[12]  Robert D Schnabel,et al.  Genome-Wide Survey of SNP Variation Uncovers the Genetic Structure of Cattle Breeds , 2009, Science.

[13]  Zhang-Zhi Hu,et al.  Omics-based molecular target and biomarker identification. , 2011, Methods in molecular biology.