Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm

A genetic risk score could be beneficial in assisting clinical diagnosis for complex diseases with high heritability. With large-scale genome-wide association (GWA) data, the current study constructed a genetic risk model with a machine learning approach for bipolar disorder (BPD). The GWA dataset of BPD from the Genetic Association Information Network was used as the training data for model construction, and the Systematic Treatment Enhancement Program (STEP) GWA data were used as the validation dataset. A random forest algorithm was applied for pre-filtered markers, and variable importance indices were assessed. 289 candidate markers were selected by random forest procedures with good discriminability; the area under the receiver operating characteristic curve was 0.944 (0.935–0.953) in the training set and 0.702 (0.681–0.723) in the STEP dataset. Using a score with the cutoff of 184, the sensitivity and specificity for BPD was 0.777 and 0.854, respectively. Pathway analyses revealed important biological pathways for identified genes. In conclusion, the present study identified informative genetic markers to differentiate BPD from healthy controls with acceptable discriminability in the validation dataset. In the future, diagnosis classification can be further improved by assessing more comprehensive clinical risk factors and jointly analysing them with genetic data in large samples.

[1]  C. Baird,et al.  The pilot study. , 2000, Orthopedic nursing.

[2]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[3]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[4]  S. Leeder,et al.  A population based study , 1993, The Medical journal of Australia.

[5]  Espen Røysamb,et al.  Heritability of bipolar spectrum disorders. Unity or heterogeneity? , 2008, Journal of affective disorders.

[6]  Vinod Sharma,et al.  Predicting Methylphenidate Response in ADHD Using Machine Learning Approaches , 2015, The international journal of neuropsychopharmacology.

[7]  Martien JH Kas,et al.  Behavioral signatures related to genetic disorders in autism , 2014, Molecular Autism.

[8]  Daniel L. Koller,et al.  Convergent functional genomics of schizophrenia: from comprehensive understanding to genetic risk prediction , 2012, Molecular Psychiatry.

[9]  V. Nascimento,et al.  Bipolar disorder incidence between children and adolescents: A brief communication. , 2015, Journal of affective disorders.

[10]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  P. Munson,et al.  Protein expression profiles distinguish between experimental invasive pulmonary aspergillosis and Pseudomonas pneumonia , 2010, Proteomics.

[13]  Blanca E Himes,et al.  Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers , 2011, BMC Medical Genetics.

[14]  Manuel A. R. Ferreira,et al.  Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 , 2011, Nature Genetics.

[15]  B J Biggerstaff,et al.  Comparing diagnostic tests: a simple graphic using likelihood ratios. , 2000, Statistics in medicine.

[16]  M. Wong,et al.  Prediction of susceptibility to major depression by a model of interactions of multiple functional genetic variants and environmental factors , 2012, Molecular Psychiatry.

[17]  Gilles Louppe,et al.  Exploiting SNP Correlations within Random Forest for Genome-Wide Association Studies , 2014, PloS one.

[18]  Yeşim Aydın Son,et al.  A Prostate Cancer Model Build by a Novel SVM-ID3 Hybrid Feature Selection Method Using Both Genotyping and Phenotype Data from dbGaP , 2014, PloS one.

[19]  S. Gabriel,et al.  Whole-genome association study of bipolar disorder , 2008, Molecular Psychiatry.

[20]  J. Kelsoe,et al.  Evidence for association of bipolar disorder to haplotypes in the 22q12.3 region near the genes stargazin, ift27 and parvalbumin , 2012, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[21]  Vincent Wai-Sun Wong,et al.  Risk estimation for hepatocellular carcinoma in chronic hepatitis B (REACH-B): development and validation of a predictive score. , 2011, The Lancet. Oncology.

[22]  Thomas E. Nichols,et al.  Common genetic variants influence human subcortical brain structures , 2015, Nature.

[23]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[24]  S. Hunt,et al.  Genetic association of the tachykinin receptor 1 TACR1 gene in bipolar disorder, attention deficit hyperactivity disorder, and the alcohol dependence syndrome , 2014, American Journal of Medical Genetics Part B: Neuropsychiatric Genetics.

[25]  S. Guze,et al.  Suicide and Primary Affective Disorders , 1970, British Journal of Psychiatry.

[26]  J. Stockman,et al.  Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study , 2010 .

[27]  E. Courchesne,et al.  Prediction of autism by translation and immune/inflammation coexpressed genes in toddlers from pediatric community practices. , 2015, JAMA psychiatry.

[28]  M. Milburn,et al.  Plasma metabolomic profile in nonalcoholic fatty liver disease. , 2011, Metabolism: clinical and experimental.

[29]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[30]  John P. Rice,et al.  Genome-wide association study of bipolar disorder in European American and African American individuals , 2009, Molecular Psychiatry.

[31]  Thomas G Schulze,et al.  Molecular genetic overlap in bipolar disorder, schizophrenia, and major depressive disorder , 2014, The world journal of biological psychiatry : the official journal of the World Federation of Societies of Biological Psychiatry.

[32]  P. Munk-Jørgensen,et al.  Mortality and secular trend in the incidence of bipolar disorder. , 2015, Journal of affective disorders.

[33]  Moore Brief communication , 1984, Psychiatry.

[34]  Adele Cutler,et al.  An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings , 2010, BMC Genetics.

[35]  Ming T. Tsuang,et al.  Blood‐based gene‐expression predictors of PTSD risk and resilience among deployed marines: A pilot study , 2013, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[36]  Ren-Hua Chung,et al.  A Two-Stage Random Forest-Based Pathway Analysis Method , 2012, PloS one.

[37]  A. Cecile J.W. Janssens,et al.  Predicting Type 2 Diabetes Based on Polymorphisms From Genome-Wide Association Studies , 2008, Diabetes.

[38]  Daniel L. Koller,et al.  Identification of pathways for bipolar disorder: a meta-analysis. , 2014, JAMA psychiatry.

[39]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[40]  J. Bodegård,et al.  Population study of disease burden, management, and treatment of bipolar disorder in Sweden: a retrospective observational registry study , 2015, Bipolar disorders.

[41]  Saharon Rosset,et al.  Effective genetic-risk prediction using mixed models. , 2014, American journal of human genetics.

[42]  M. Pencina,et al.  General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study , 2008, Circulation.

[43]  E Skafidas,et al.  Predicting the diagnosis of autism spectrum disorder using gene pathway analysis , 2012, Molecular Psychiatry.

[44]  F. Schmidt Meta-Analysis , 2008 .

[45]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[46]  Disorder Working Group Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 , 2012, Nature Genetics.

[47]  Tyrone D. Cannon,et al.  Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study , 2009, The Lancet.

[48]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[49]  Laura J. Scott,et al.  Joint Analysis of Psychiatric Disorders Increases Accuracy of Risk Prediction for Schizophrenia, Bipolar Disorder, and Major Depressive Disorder , 2015, American journal of human genetics.

[50]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[51]  P. Kuo,et al.  Pathway Analysis Using Information from Allele-Specific Gene Methylation in Genome-Wide Association Studies for Bipolar Disorder , 2013, PloS one.

[52]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[53]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[54]  D. Goldstein Common genetic variation and human traits. , 2009, The New England journal of medicine.

[55]  Andreas Ziegler,et al.  Risk estimation and risk prediction using machine-learning methods , 2012, Human Genetics.

[56]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[57]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.