Haplotype-based Classifiers to Predict Individual Susceptibility to Complex Diseases - An Example for Multiple Sclerosis

The enormous amount of genetic data that is currently being produced with the explosion of genome-wide association studies is yielding an important effort in the construction of genetic-based predictive models for individual susceptibility to complex diseases. However, a constant pattern of low accuracy is observed in most of them. We hypothesize that a main cause of their low accuracy is the strong reduction of genetic information considered by the classifiers, and propose a three-fold solution that considers haplotype instead of genotype individual data, whole-genome markers instead of a more stringent selection and several-marker risk variants instead of only one or two. We have compared the performance of our approach with current approaches to predict individual genetic risk to multiple sclerosis, and have found that our method yielded significantly more accurate classifiers.

[1]  Hannu Toivonen,et al.  TreeDT: tree pattern mining for gene mapping , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  J. Kaprio,et al.  Concordance and heritability of multiple sclerosis in Finland: study on a nationwide series of twins , 2008, European journal of neurology.

[3]  María M. Abad-Grau,et al.  Improving Reproducibility on Tree Based Multimarker Methods: TreeDTh , 2011, PACBB.

[4]  Vineet Bafna,et al.  Sample Reproducibility of Genetic Association Using Different Multimarker TDTs in Genome-Wide Association Studies: Characterization and a New Approach , 2012, PloS one.

[5]  María M. Abad-Grau,et al.  Genome-wide association filtering using a highly locus-specific transmission/disequilibrium test , 2010, Human Genetics.

[6]  Peter M Visscher,et al.  Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. , 2009, Human molecular genetics.

[7]  F. Clerget-Darpoux,et al.  Statistical properties of the allelic and genotypic transmission/disequilibrium test for multiallelic markers , 1995, Genetic epidemiology.

[8]  Chengjie Xiong,et al.  Global transmission/disequilibrium tests based on haplotype sharing in multiple candidate genes , 2005, Genetic epidemiology.

[9]  Naomi R. Wray,et al.  association studies Prediction of individual genetic risk to disease from genome-wide , 2007 .

[10]  L. Wasserman,et al.  On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. , 2003, American journal of human genetics.

[11]  Jianping Dong,et al.  Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. , 2003, American journal of human genetics.

[12]  D. Curtis,et al.  An extended transmission/disequilibrium test (TDT) for multi‐allele marker loci , 1995, Annals of human genetics.

[13]  Jing Cui,et al.  Integration of genetic risk factors into a clinical algorithm for multiple sclerosis susceptibility: a weighted genetic risk score , 2009, The Lancet Neurology.

[14]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[15]  William S Bush,et al.  Evidence for polygenic susceptibility to multiple sclerosis--the shape of things to come. , 2010, American journal of human genetics.

[16]  Title Modeling the Cumulative Genetic Risk for Multiple Sclerosis from Genome Wide Association Data Permalink , 2011 .