Next generation analytic tools for large scale genetic epidemiology studies of complex diseases

Over the past several years, genome‐wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large‐Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large‐scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene‐gene and gene‐environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized. Genet. Epidemiol. 36 : 22–35, 2012. © 2011 Wiley Periodicals, Inc.

[1]  Peter Kraft,et al.  Gene-environment interactions in genome-wide association studies: a comparative study of tests applied to empirical studies of type 2 diabetes. , 2012, American journal of epidemiology.

[2]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[3]  R. Chiodini,et al.  The impact of next-generation sequencing on genomics. , 2011, Journal of genetics and genomics = Yi chuan xue bao.

[4]  P S Albert,et al.  Limitations of the case-only design for identifying gene-environment interactions. , 2001, American journal of epidemiology.

[5]  Jason H. Moore,et al.  Human Microbiome Visualization Using 3d Technology , 2011, Pacific Symposium on Biocomputing.

[6]  M. Weale,et al.  A survey of genetic simulation software for population and epidemiological studies , 2008, Human Genomics.

[7]  W. Thilly,et al.  A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). , 2007, Mutation research.

[8]  Jack A. Taylor,et al.  Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. , 1994, Statistics in medicine.

[9]  S Greenland,et al.  Concepts of interaction. , 1980, American journal of epidemiology.

[10]  T. Frayling,et al.  Novel biological insights emerging from genetic studies of type 2 diabetes and related metabolic traits , 2010, Current opinion in lipidology.

[11]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[12]  Patrick Neven,et al.  Low penetrance breast cancer susceptibility loci are associated with specific breast tumor subtypes: findings from the Breast Cancer Association Consortium. , 2011, Human molecular genetics.

[13]  Juan Pablo Lewinger,et al.  Invited commentary: GE-Whiz! Ratcheting gene-environment studies up to the whole genome and the whole exposome. , 2012, American journal of epidemiology.

[14]  Jason H. Moore,et al.  Development and Evaluation of an Open-Ended Computational Evolution System for the Genetic Analysis of Susceptibility to Common Human Diseases , 2008, EvoBIO.

[15]  David V Conti,et al.  Detecting gene-environment interactions using a combined case-only and case-control approach. , 2008, American journal of epidemiology.

[16]  I. Gottesman,et al.  The endophenotype concept in psychiatry: etymology and strategic intentions. , 2003, The American journal of psychiatry.

[17]  E. Mardis The $1,000 genome, the $100,000 analysis? , 2010, Genome Medicine.

[18]  A. Whittemore,et al.  Assessing interactions between the associations of common genetic susceptibility variants, reproductive history and body mass index with breast cancer risk in the breast cancer association consortium: a combined case-control study , 2010, Breast Cancer Research.

[19]  M C Neale,et al.  Endophenotype: a conceptual analysis , 2010, Molecular Psychiatry.

[20]  Marek Kimmel,et al.  Forward-Time Simulations of Human Populations with Complex Diseases , 2007, PLoS genetics.

[21]  C R Weinberg,et al.  Applicability of the simple independent action model to epidemiologic studies involving two factors and a dichotomous outcome. , 1986, American journal of epidemiology.

[22]  E. Zeggini Next-generation association studies for complex traits , 2011, Nature Genetics.

[23]  Eric E. Schadt,et al.  Integrating genetic and gene expression data: application to cardiovascular and metabolic traits in mice , 2006, Mammalian Genome.

[24]  B Langholz,et al.  Counter-matching in studies of gene-environment interaction: efficiency and feasibility. , 2001, American journal of epidemiology.

[25]  David P. Sexton,et al.  Managing and Analyzing Next-Generation Sequence Data , 2009, PLoS Comput. Biol..

[26]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[27]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[28]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[29]  N. Day,et al.  Synergism and interaction: are they equivalent? , 1979, American journal of epidemiology.

[30]  W. Willett,et al.  Large-scale exploration of gene-gene interactions in prostate cancer using a multistage genome-wide association study. , 2011, Cancer research.

[31]  S. Vansteelandt,et al.  On the adjustment for covariates in genetic association analysis: a novel, simple principle to infer direct causal effects , 2009, Genetic epidemiology.

[32]  L. Carvajal-Carmona,et al.  Challenges in the identification and use of rare disease-associated predisposition variants. , 2010, Current opinion in genetics & development.

[33]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[34]  Rongling Li,et al.  Quality Control Procedures for Genome‐Wide Association Studies , 2011, Current protocols in human genetics.

[35]  Jiang Gui,et al.  Symbolic Modeling of Epistasis , 2007, Human Heredity.

[36]  D. Thomas,et al.  Biological models and statistical interactions: an example from multistage carcinogenesis. , 1981, International journal of epidemiology.

[37]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[38]  S. Ebrahim,et al.  Mendelian randomization: prospects, potentials, and limitations. , 2004, International journal of epidemiology.

[39]  W. Thompson,et al.  Effect modification and the limits of biological inference from epidemiologic data. , 1991, Journal of clinical epidemiology.

[40]  Bhramar Mukherjee,et al.  Exploiting Gene‐Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes‐Type Shrinkage Estimator to Trade‐Off between Bias and Efficiency , 2008, Biometrics.

[41]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[42]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[43]  M. Guyer,et al.  Charting a course for genomic medicine from base pairs to bedside , 2011, Nature.

[44]  Elizabeth Pennisi,et al.  Human genome 10th anniversary. Will computers crash genomics? , 2011, Science.

[45]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[46]  E. Lander Initial impact of the sequencing of the human genome , 2011, Nature.

[47]  John S. Witte,et al.  Comprehensive Approach to Analyzing Rare Genetic Variants , 2010, PloS one.

[48]  Yun Li,et al.  Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. , 2010, American journal of human genetics.

[49]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[50]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[51]  Jason H. Moore,et al.  Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS , 2010, Bioinform..

[52]  William S Bush,et al.  Genome simulation approaches for synthesizing in silico datasets for human genomics. , 2010, Advances in genetics.

[53]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[54]  T. Sellers Statistical Methods in Genetic Epidemiology , 2005 .

[55]  Ingo Ruczinski,et al.  Identifying interacting SNPs using Monte Carlo logic regression , 2005, Genetic epidemiology.

[56]  Raymond J Carroll,et al.  Shrinkage Estimators for Robust and Efficient Inference in Haplotype-Based Case-Control Studies , 2009, Journal of the American Statistical Association.

[57]  D. Thomas,et al.  Gene–environment-wide association studies: emerging approaches , 2010, Nature Reviews Genetics.

[58]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[59]  Jaeil Ahn,et al.  Testing gene-environment interaction in large-scale case-control association studies: possible choices and comparisons. , 2012, American journal of epidemiology.

[60]  W. Gauderman,et al.  Gene-environment interaction in genome-wide association studies. , 2008, American journal of epidemiology.

[61]  P. Rosenstiel,et al.  Towards a molecular risk map--recent advances on the etiology of inflammatory bowel disease. , 2009, Seminars in immunology.

[62]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[63]  Bryan Langholz,et al.  Counter-matching: A stratified nested case-control sampling method , 1995 .

[64]  Hua Zhou,et al.  Association screening of common and rare genetic variants by penalized regression , 2010, Bioinform..

[65]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[66]  R. Elston,et al.  A cautionary note on the use of Mendelian randomization to infer causation in observational epidemiology. , 2008, International journal of epidemiology.

[67]  Fabio Cancare,et al.  Accelerating epistasis analysis in human genetics with consumer graphics hardware , 2009, BMC Research Notes.

[68]  S Greenland,et al.  Basic problems in interaction assessment. , 1993, Environmental health perspectives.

[69]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[70]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[71]  W. G. Hill,et al.  Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits , 2008, PLoS genetics.

[72]  Peter Kraft,et al.  Quality control and quality assurance in genotypic data for genome‐wide association studies , 2010, Genetic epidemiology.

[73]  C Kooperberg,et al.  Sequence Analysis Using Logic Regression , 2001, Genetic epidemiology.

[74]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[75]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[76]  Paolo Vineis,et al.  Molecular Epidemiology and Biomarkers in Etiologic Cancer Research: The New in Light of the Old , 2007, Cancer Epidemiology Biomarkers & Prevention.

[77]  R. Jirtle,et al.  Environmental epigenomics and disease susceptibility , 2007, Nature Reviews Genetics.

[78]  V. Bansal,et al.  Statistical analysis strategies for association studies involving rare variants , 2010, Nature Reviews Genetics.

[79]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[80]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[81]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.