A walk into random forests: adaptation and application to Genome-Wide Association Studies

Devant un jury compose de : MM. P. GEURTS, Charge de cours a l'Universite de Liege, President; L. WEHENKEL, Professeur a l'Universite de Liege, Promoteur; M. GEORGES, Professeur en Faculte de Medecine Veterinaire a l'Universite de Liege; Mme K. VAN STEEN, Chargee de cours a l'Universite de Liege; MM. T. DRUET, Chercheur qualifie au FNRS, GIGA, Universite de Liege; Y. SAEYS, Docteur a l'Universite de Gand; Mme C. SINOQUET, Professeur a l'Universite de Nantes (France)

[1]  Xin Wang,et al.  SNP interaction detection with Random Forests in high-dimensional genetic data , 2012, BMC Bioinformatics.

[2]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[3]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[4]  Laura J. Bierut,et al.  A New Statistic to Evaluate Imputation Reliability , 2010, PloS one.

[5]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[6]  C. Marshall,et al.  Genome-Wide Copy Number Analysis Uncovers a New HSCR Gene: NRG3 , 2012, PLoS genetics.

[7]  C. Furlanello,et al.  Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies , 2010, The Pharmacogenomics Journal.

[8]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[9]  Kristel Van Steen,et al.  Travelling the world of gene-gene interactions , 2012, Briefings Bioinform..

[10]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[11]  D. Blackwood,et al.  Association of Neuregulin 1 with schizophrenia and bipolar disorder in a second cohort from the Scottish population , 2007, Molecular Psychiatry.

[12]  J. Piriyapongsa,et al.  iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies , 2012, BMC Genomics.

[13]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[14]  Rongling Li,et al.  Quality Control Procedures for Genome‐Wide Association Studies , 2011, Current protocols in human genetics.

[15]  L. Wehenkel On uncertainty measures used for decision tree induction , 1996 .

[16]  Carolin Strobl,et al.  The behaviour of random forest permutation-based variable importance measures under predictor correlation , 2010, BMC Bioinformatics.

[17]  Ling Zhou,et al.  Single-nucleotide polymorphisms inside microRNA target sites influence the susceptibility to type 2 diabetes , 2013, Journal of Human Genetics.

[18]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[19]  Jos Boekhorst,et al.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? , 2012, Briefings Bioinform..

[20]  M. Ehm,et al.  Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. , 1998, American journal of human genetics.

[21]  Luís Torgo,et al.  A Study on End-Cut Preference in Least Squares Regression Trees , 2001, EPIA.

[22]  Paola Giunti,et al.  Deletion at ITPR1 Underlies Ataxia in Mice and Spinocerebellar Ataxia 15 in Humans , 2007, PLoS genetics.

[23]  Klaus-Robert Müller,et al.  Optimizing transition states via kernel-based machine learning. , 2012, The Journal of chemical physics.

[24]  I. König,et al.  Look who is calling: a comparison of genotype calling algorithms , 2009, BMC proceedings.

[25]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[26]  S. P. Fodor,et al.  Large-scale genotyping of complex DNA , 2003, Nature Biotechnology.

[27]  Gilles Louppe,et al.  Ensembles on Random Patches , 2012, ECML/PKDD.

[28]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[29]  Pamela B. Mahon,et al.  Meta‐analysis of genetic association studies on bipolar disorder , 2012, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[30]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[31]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[32]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[33]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[34]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[35]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[36]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[37]  Bartosz Balis,et al.  Flood early warning system: design, implementation and computational modules , 2011, ICCS.

[38]  Rui Jiang,et al.  A random forest approach to the detection of epistatic interactions in case-control studies , 2009, BMC Bioinformatics.

[39]  R. Machado-Vieira,et al.  The impact of the CACNA1C risk allele on limbic structures and facial emotions recognition in bipolar disorder subjects and healthy controls. , 2012, Journal of affective disorders.

[40]  Z. Ye,et al.  Protein Kinase CK2 Increases Glutamatergic Input in the Hypothalamus and Sympathetic Vasomotor Tone in Hypertension , 2011, The Journal of Neuroscience.

[41]  Shushan Zhao,et al.  Meta-analysis of association between PITX3 gene polymorphism and Parkinson's disease , 2012, Journal of the Neurological Sciences.

[42]  B. Shastry SNPs: impact on gene function and phenotype. , 2009, Methods in molecular biology.

[43]  Todd A. Johnson,et al.  A genome-wide association study identifies common variants near LBX1 associated with adolescent idiopathic scoliosis , 2011, Nature Genetics.

[44]  Joseph T. Glessner,et al.  From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes , 2009, PLoS genetics.

[45]  Kristin K. Nicodemus,et al.  Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures , 2011, Briefings Bioinform..

[46]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[47]  James D. Malley,et al.  Predictor correlation impacts machine learning algorithms: implications for genomic studies , 2009, Bioinform..

[48]  F. Takeuchi,et al.  Search for type 2 diabetes susceptibility genes on chromosomes 1q, 3q and 12q , 2008, Journal of Human Genetics.

[49]  M. Daly,et al.  Two loci on chromosomes 2 and X for premature coronary heart disease identified in early- and late-settlement populations of Finland. , 2000, American journal of human genetics.

[50]  Weihua Chang,et al.  Whole-genome genotyping with the single-base extension assay , 2005, Nature Methods.

[51]  Yi Yu,et al.  Performance of random forest when SNPs are in linkage disequilibrium , 2009, BMC Bioinformatics.

[52]  R. Pfundt,et al.  The phenotype of recurrent 10q22q23 deletions and duplications , 2011, European Journal of Human Genetics.

[53]  R. Elston,et al.  Two-marker association tests yield new disease associations for coronary artery disease and hypertension , 2011, Human Genetics.

[54]  P. Geurts,et al.  Random subwindows and extremely randomized trees for image classification in cell biology , 2007, BMC Cell Biology.