A statistical comparison of grammatical evolution strategies in the domain of human genetics

Detecting and characterizing genetic predictors of human disease susceptibility is an important goal in human genetics. New chip-based technologies are available that facilitate the measurement of thousands of DNA sequence variations across the human genome. Biologically-inspired stochastic search algorithms are expected to play an important role in the analysis of these high-dimensional datasets. We simulated datasets with up to 6000 attributes using two different genetic models and statistically compared the performance of grammatical evolution, grammatical swarm, and random search for building symbolic discriminant functions. We found no statistical difference among search algorithms within this specific domain

[1]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[2]  J J Rowland,et al.  Model selection methodology in supervised learning with evolutionary computation. , 2003, Bio Systems.

[3]  Jason H. Moore,et al.  Symbolic discriminant analysis of microarray data in autoimmune disease , 2002, Genetic epidemiology.

[4]  C. Carlson,et al.  Mapping complex disease loci in whole-genome association studies , 2004, Nature.

[5]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[6]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[7]  Scott M. Williams,et al.  New strategies for identifying gene-gene interactions in hypertension , 2002, Annals of medicine.

[8]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[9]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[10]  Jason H. Moore,et al.  Symbolic Discriminant Analysis for Mining Gene Expression Patterns , 2001, ECML.

[11]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[12]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[13]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[14]  David M. Reif,et al.  Integrated analysis of genetic, genomic and proteomic data , 2004, Expert review of proteomics.

[15]  Yin Shan Program distribution estimation with grammar models , 2004 .

[16]  W. Bateson Mendel's Principles of Heredity , 1910, Nature.

[17]  S. P. Fodor,et al.  Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays , 2004, Nature Methods.

[18]  M. O'Neill,et al.  Grammatical evolution , 2001, GECCO '09.

[19]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[20]  Sidney Addelman,et al.  trans-Dimethanolbis(1,1,1-trifluoro-5,5-dimethylhexane-2,4-dionato)zinc(II) , 2008, Acta crystallographica. Section E, Structure reports online.

[21]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[22]  Anthony Brabazon,et al.  Grammatical Swarm , 2004, GECCO.

[23]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[24]  Jason H. Moore,et al.  STUDENTJAMA. The challenges of whole-genome approaches to common diseases. , 2004, JAMA.