Gene interaction networks boost genetic algorithm performance in biomarker discovery

In recent years, the advent of high-throughput techniques led to significant acceleration of biomarker discovery. In the same time, the popularity of machine learning methods grown in the field, mostly due to inherit analytical problems associated with the data resulting from these massively parallelized experiments. However, learning algorithms are very often utilized in their basic form, hence sometimes failing to consider interactions that are present between biological subjects (i.e. genes). In this context, we propose a new methodology, based on genetic algorithms, that integrates prior information through a novel genetic operator. In this particular application, we rely on a biological knowledge that is captured by the gene interaction networks. We demonstrate the advantageous performance of our method compared to a simple genetic algorithm by testing it on several microarray datasets containing samples of tissue from cancer patients. The obtained results suggest that inclusion of biological knowledge into genetic algorithm in the form of this operator can boost its effectiveness in the biomarker discovery problem.

[1]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[3]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[4]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[5]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[6]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[7]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 2000, Springer Berlin Heidelberg.

[8]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[9]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[10]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[11]  Zbigniew Michalewicz,et al.  Evolutionary Computation 2 : Advanced Algorithms and Operators , 2000 .

[12]  E. Talbi,et al.  A Genetic Algorithm for Feature Selection in Data-Mining for Genetics , 2001 .

[13]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[14]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[15]  B. Williams-Jones History of a gene patent: tracing the development and application of commercial BRCA testing. , 2002, Health law journal.

[16]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  S. Thorgeirsson,et al.  Application of comparative functional genomics to identify best-fit mouse models to study human cancer , 2004, Nature Genetics.

[19]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[20]  J. Hoheisel,et al.  DNA Microarray Analysis of Pancreatic Malignancies , 2004, Pancreatology.

[21]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[22]  Steven A Carr,et al.  Protein biomarker discovery and validation: the long and uncertain path to clinical utility , 2006, Nature Biotechnology.

[23]  A. Witteveen,et al.  Converting a breast cancer microarray signature into a high-throughput diagnostic test , 2006, BMC Genomics.

[24]  Sanghamitra Bandyopadhyay,et al.  Classification and learning using genetic algorithms - applications in bioinformatics and web intelligence , 2007, Natural computing series.

[25]  C. Gondro,et al.  A simple genetic algorithm for multiple sequence alignment. , 2007, Genetics and molecular research : GMR.

[26]  Liviu Badea,et al.  Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. , 2008, Hepato-gastroenterology.

[27]  R. Bernards,et al.  Enabling personalized cancer medicine through analysis of gene-expression patterns , 2008, Nature.

[28]  May D. Wang,et al.  Convergence of biomarkers, bioinformatics and nanotechnology for individualized cancer treatment. , 2009, Trends in biotechnology.

[29]  Keki M. Burjorjee Generative fixation: A unified explanation for the adaptive capacity of simple recombinative genetic algorithms , 2010, SEVO.

[30]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[31]  Dariusz Plewczynski,et al.  Protein-protein interaction and pathway databases, a graphical review , 2011, Briefings Bioinform..

[32]  Bart De Moor,et al.  A Simple Genetic Algorithm for Biomarker Mining , 2012, PRIB.

[33]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[34]  Bart De Moor,et al.  A Genetic Algorithm for Pancreatic Cancer Diagnosis , 2013, EANN.

[35]  N. Hu,et al.  Comparison of Global Gene Expression of Gastric Cardia and Noncardia Cancers from a High-Risk Population in China , 2013, PloS one.

[36]  Anália Lourenço,et al.  Pathogenicity phenomena in three model systems: from network mining to emerging system-level properties , 2015, Briefings Bioinform..