A Multi-objective Genetic Programming Biomarker Detection Approach in Mass Spectrometry Data

Mass spectrometry is currently the most commonly used technology in biochemical research for proteomic analysis. The main goal of proteomic profiling using mass spectrometry is the classification of samples from different clinical states. This requires the identification of proteins or peptides (biomarkers) that are expressed differentially between different clinical states. However, due to the high dimensionality of the data and the small number of samples, classification of mass spectrometry data is a challenging task. Therefore, an effective feature manipulation algorithm either through feature selection or construction is needed to enhance the classification performance and at the same time minimise the number of features. Most of the feature manipulation methods for mass spectrometry data treat this problem as a single objective task which focuses on improving the classification performance. This paper presents two new methods for biomarker detection through multi-objective feature selection and feature construction. The results show that the proposed multi-objective feature selection method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. Moreover, the multi-objective feature construction algorithm further improves the perfomance over the multi-objective feature selection algorithm. This paper is the first multi-objective genetic programming approach for biomarker detection in mass spectrometry data.

[1]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[2]  John R. Koza,et al.  Genetic Programming III: Darwinian Invention & Problem Solving , 1999 .

[3]  Zili Zhang,et al.  A Clustering Based Hybrid System for Mass Spectrometry Data Analysis , 2008, PRIB.

[4]  Concha Bielza,et al.  Peakbin Selection in Mass Spectrometry Data Using a Consensus Approach with Estimation of Distribution Algorithms , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Susmita Datta Feature selection and machine learning with mass spectrometry data. , 2013, Methods in molecular biology.

[6]  Somnath Datta,et al.  Classification of Breast Cancer versus Normal Samples from Mass Spectrometry Profiles Using Linear Discriminant Analysis of Important Features Selected by Random Forest , 2008, Statistical applications in genetics and molecular biology.

[7]  Habtom W. Ressom,et al.  Ant Colony Optimization for Biomarker Identification from MALDI-TOF Mass Spectra , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[9]  Mohamed Batouche,et al.  Biomarker Discovery Based on Large-Scale Feature Selection and MapReduce , 2015, CIIA.

[10]  Mengjie Zhang,et al.  Multiple feature construction for effective biomarker identification and classification using genetic programming , 2014, GECCO.

[11]  E. Petricoin,et al.  Preinvasive and invasive ductal pancreatic cancer and its early detection in the mouse. , 2003, Cancer cell.

[12]  P. Nordin Genetic Programming III - Darwinian Invention and Problem Solving , 1999 .

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[16]  Gerhard Tutz,et al.  Supervised feature selection in mass spectrometry-based proteomic profiling by blockwise boosting , 2009, Bioinform..

[17]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[18]  Mengjie Zhang,et al.  Unsupervised Elimination of Redundant Features Using Genetic Programming , 2009, Australasian Conference on Artificial Intelligence.

[19]  M.A. El-Sharkawi,et al.  Pareto Multi Objective Optimization , 2005, Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems.

[20]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[21]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[22]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[23]  Mengjie Zhang,et al.  Genetic Programming for Measuring Peptide Detectability , 2014, SEAL.

[24]  Mengjie Zhang,et al.  Binary PSO and Rough Set Theory for Feature Selection: a Multi-objective filter Based Approach , 2014, Int. J. Comput. Intell. Appl..

[25]  Mengjie Zhang,et al.  Improving feature ranking for biomarker discovery in proteomics mass spectrometry data using genetic programming , 2014, Connect. Sci..

[26]  Aravind Seshadri,et al.  A FAST ELITIST MULTIOBJECTIVE GENETIC ALGORITHM: NSGA-II , 2000 .

[27]  Mark Johnston,et al.  Feature Construction and Dimension Reduction Using Genetic Programming , 2007, Australian Conference on Artificial Intelligence.

[28]  John R. Koza,et al.  Genetic Programming III - Darwinian Invention and Problem Solving , 1999, Evolutionary Computation.

[29]  Jin-Kao Hao,et al.  Advances in metaheuristics for gene selection and classification of microarray data , 2010, Briefings Bioinform..

[30]  Hasan Demirel,et al.  Application of NSGA-II to feature selection for facial expression recognition , 2011, Comput. Electr. Eng..

[31]  E. Petricoin,et al.  Toxicoproteomics: Serum Proteomic Pattern Diagnostics for Early Detection of Drug Induced Cardiac Toxicities and Cardioprotection , 2004, Toxicologic pathology.

[32]  Kimberly D. Siegmund,et al.  Modeling DNA Methylation in a Population of Cancer Cells , 2008, Statistical applications in genetics and molecular biology.

[33]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian Cancer , 2002 .

[34]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Mengjie Zhang,et al.  Differential evolution (DE) for multi-objective feature selection in classification , 2014, GECCO.