Use of pathway information in molecular epidemiology

Candidate gene studies are generally motivated by some form of pathway reasoning in the selection of genes to be studied, but seldom has the logic of the approach been carried through to the analysis. Marginal effects of polymorphisms in the selected genes, and occasionally pairwise gene-gene or gene-environment interactions, are often presented, but a unified approach to modelling the entire pathway has been lacking. In this review, a variety of approaches to this problem is considered, focusing on hypothesis-driven rather than purely exploratory methods. Empirical modelling strategies are based on hierarchical models that allow prior knowledge about the structure of the pathway and the various reactions to be included as 'prior covariates'. By contrast, mechanistic models aim to describe the reactions through a system of differential equations with rate parameters that can vary between individuals, based on their genotypes. Some ways of combining the two approaches are suggested and Bayesian model averaging methods for dealing with uncertainty about the true model form in either framework is discussed. Biomarker measurements can be incorporated into such analyses, and two-phase sampling designs stratified on some combination of disease, genes and exposures can be an efficient way of obtaining data that would be too expensive or difficult to obtain on a full candidate gene sample. The review concludes with some thoughts about potential uses of pathways in genome-wide association studies.

[1]  L. Smeeth,et al.  Limits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. , 2006, American journal of epidemiology.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  R. Glynn Commentary: genes as instruments for evaluation of markers and causes. , 2006, International journal of epidemiology.

[4]  Ingo Ruczinski,et al.  Exploring interactions in high-dimensional genomic data: an overview of logic regression, with applications , 2004 .

[5]  M. Tobin,et al.  Commentary: development of Mendelian randomization: from hypothesis test to 'Mendelian deconfounding'. , 2004, International journal of epidemiology.

[6]  L. Brody,et al.  The search for genetic polymorphisms in the homocysteine/folate pathway that contribute to the etiology of human neural tube defects. , 2009, Birth defects research. Part A, Clinical and molecular teratology.

[7]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[8]  D. Thomas,et al.  The Need for a Systematic Approach to Complex Pathways in Molecular Epidemiology , 2005, Cancer Epidemiology Biomarkers & Prevention.

[9]  Leena Peltonen,et al.  Genome-wide association study of smoking initiation and current smoking. , 2009, American journal of human genetics.

[10]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[11]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Li Wang,et al.  An integrative approach for causal gene identification and gene regulatory pathway inference , 2006, ISMB.

[13]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[14]  John D Potter,et al.  Colon Cancer Family Registry: An International Resource for Studies of the Genetic Epidemiology of Colon Cancer , 2007, Cancer Epidemiology Biomarkers & Prevention.

[15]  N. Sheehan,et al.  Mendelian randomization as an instrumental variable approach to causal inference , 2007, Statistical methods in medical research.

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  Ritsert C. Jansen,et al.  Studying complex biological systems using multifactorial perturbation , 2003, Nature Reviews Genetics.

[18]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[19]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[20]  P. Brennan,et al.  Inherited Predisposition of Lung Cancer: A Hierarchical Modeling Approach to DNA Repair and Cell Cycle Control Pathways , 2007, Cancer Epidemiology Biomarkers & Prevention.

[21]  Cornelia M Ulrich,et al.  A mathematical model of glutathione metabolism , 2008, Theoretical Biology and Medical Modelling.

[22]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[23]  Jon Wakefield,et al.  Reporting and interpretation in genome-wide association studies. , 2008, International journal of epidemiology.

[24]  Cornelia M Ulrich,et al.  A Mathematical Model of the Folate Cycle , 2004, Journal of Biological Chemistry.

[25]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[26]  Pankaj Agarwal,et al.  Inferring pathways from gene lists using a literature-derived network of biological relationships , 2005, Bioinform..

[27]  L. Wasserman,et al.  Improving power in genome‐wide association studies: weights tip the scale , 2007, Genetic Epidemiology.

[28]  A. Blais,et al.  Constructing transcriptional regulatory networks. , 2005, Genes & development.

[29]  Hsuan-Cheng Huang,et al.  GeneNetwork: an interactive tool for reconstruction of genetic networks using microarray data , 2004, Bioinform..

[30]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[31]  Duncan C Thomas,et al.  Multistage sampling for latent variable models , 2007, Lifetime data analysis.

[32]  C. Ulrich,et al.  Folate Supplementation: Too Much of a Good Thing? , 2006, Cancer Epidemiology Biomarkers & Prevention.

[33]  Marylyn D. Ritchie,et al.  Biofilter: A Knowledge-Integration System for the Multi-Locus Analysis of Genome-Wide Association Studies , 2008, Pacific Symposium on Biocomputing.

[34]  Y. Pawitan,et al.  Strategies and issues in the detection of pathway enrichment in genome-wide association studies , 2009, Human Genetics.

[35]  Ingo Ruczinski,et al.  Identifying interacting SNPs using Monte Carlo logic regression , 2005, Genetic epidemiology.

[36]  C. Wijmenga,et al.  Using genome‐wide pathway analysis to unravel the etiology of complex diseases , 2009, Genetic epidemiology.

[37]  J. Ott,et al.  Mathematical multi-locus approaches to localizing complex human trait genes , 2003, Nature Reviews Genetics.

[38]  Wei Min,et al.  Single-molecule Michaelis-Menten equations. , 2005, The journal of physical chemistry. B.

[39]  T. Werner Bioinformatics applications for pathway analysis of microarray data. , 2008, Current opinion in biotechnology.

[40]  Richard D Riley,et al.  Meta‐analysis of genetic studies using Mendelian randomization—a multivariate approach , 2005, Statistics in medicine.

[41]  J. Besag,et al.  Bayesian image restoration, with two applications in spatial statistics , 1991 .

[42]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[43]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[44]  David V Conti,et al.  Commentary: the concept of 'Mendelian Randomization'. , 2004, International journal of epidemiology.

[45]  N. Cook,et al.  Tree and spline based association analysis of gene–gene interaction models for ischemic stroke , 2004, Statistics in medicine.

[46]  John S Witte,et al.  Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[47]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[48]  H. Nijhout,et al.  Mathematical models of folate-mediated one-carbon metabolism. , 2008, Vitamins and hormones.

[49]  Gisbert Schneider,et al.  Support vector machine applications in bioinformatics. , 2003, Applied bioinformatics.

[50]  D. Conti,et al.  Pathway-Based Methods in Molecular Cancer Epidemiology , 2008 .

[51]  Antoine M. van Oijen,et al.  Ever-fluctuating single enzyme molecules: Michaelis-Menten equation revisited , 2006, Nature chemical biology.

[52]  D. Tregouet,et al.  Automated detection of informative combined effects in genetic association studies of complex traits. , 2003, Genome research.

[53]  David J. Spiegelhalter,et al.  Estimation of population pharmacokinetics using the Gibbs sampler , 1995, Journal of Pharmacokinetics and Biopharmaceutics.

[54]  S H Moolgavkar,et al.  Mutation and cancer: a model for human carcinogenesis. , 1981, Journal of the National Cancer Institute.

[55]  David V Conti,et al.  A testing framework for identifying susceptibility genes in the presence of epistasis. , 2006, American journal of human genetics.

[56]  D. Thomas,et al.  Exposure measurement error: influence on exposure-disease. Relationships and methods of correction. , 1993, Annual review of public health.

[57]  Wei Pan,et al.  Incorporating prior information via shrinkage: a combined analysis of genome‐wide location data and gene expression data , 2007, Statistics in medicine.

[58]  Juan P Casas,et al.  Estimation of bias in nongenetic observational studies using "mendelian triangulation". , 2006, Annals of epidemiology.

[59]  J. Potter,et al.  Colorectal cancer: molecules and populations. , 1999, Journal of the National Cancer Institute.

[60]  Alberto Riva,et al.  A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples , 2008, BMC Genetics.

[61]  K. Shianna,et al.  Large-scale pathways-based association study in amyotrophic lateral sclerosis. , 2007, Brain : a journal of neurology.

[62]  J. Robins,et al.  Instruments for Causal Inference: An Epidemiologist's Dream? , 2006, Epidemiology.

[63]  Timothy R. Rebbeck,et al.  Assessing the function of genetic variants in candidate gene association studies , 2004, Nature Reviews Genetics.

[64]  R. Carroll,et al.  Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. , 2001, Statistics in medicine.

[65]  Duncan C Thomas,et al.  Viewpoint: using gene–environment interactions to dissect the effects of complex mixtures , 2007, Journal of Exposure Science and Environmental Epidemiology.

[66]  D. Thomas,et al.  Toxicokinetic genetics: an approach to gene-environment and gene-gene interactions in complex metabolic pathways. , 2004, IARC scientific publications.

[67]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[68]  Barry Shane,et al.  A mathematical model gives insights into nutritional and genetic aspects of folate-mediated one-carbon metabolism. , 2006, The Journal of nutrition.

[69]  Jon Wakefield,et al.  Bayesian Analysis of Population PK/PD Models: General Concepts and Software , 2002, Journal of Pharmacokinetics and Pharmacodynamics.

[70]  P. Matthews,et al.  Pathway and network-based analysis of genome-wide association studies in multiple sclerosis , 2009, Human molecular genetics.

[71]  Bill C White,et al.  Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases , 2003, BMC Bioinformatics.

[72]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[73]  Tae Hoon Kim,et al.  Genome-wide analysis of protein-DNA interactions. , 2006, Annual review of genomics and human genetics.

[74]  P. Thomas,et al.  A systems biology network model for genetic association studies of nicotine addiction and treatment , 2009, Pharmacogenetics and genomics.

[75]  M. McCarthy,et al.  Interrogating Type 2 Diabetes Genome-Wide Association Data Using a Biological Pathway-Based Approach , 2009, Diabetes.

[76]  Wei Pan,et al.  Incorporating Biological Information as a Prior in an Empirical Bayes Approach to Analyzing Microarray Data , 2005, Statistical applications in genetics and molecular biology.

[77]  Marit Holden,et al.  GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies , 2008, Bioinform..

[78]  L. Liang,et al.  A genome-wide association study of global gene expression , 2007, Nature Genetics.

[79]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[80]  S. Ebrahim,et al.  What can mendelian randomisation tell us about modifiable behavioural and environmental exposures? , 2005, BMJ : British Medical Journal.

[81]  S. Lewis,et al.  Alcohol, ALDH2, and Esophageal Cancer: A Meta-analysis Which Illustrates the Potentials and Limitations of a Mendelian Randomization Approach , 2005, Cancer Epidemiology Biomarkers & Prevention.

[82]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[83]  James W Baurley,et al.  Hierarchical Bayes prioritization of marker associations from a genome‐wide association scan for further investigation , 2007, Genetic epidemiology.

[84]  James W Baurley,et al.  Approaches to complex pathways in molecular epidemiology: summary of a special conference of the American Association for Cancer Research. , 2008, Cancer research.

[85]  Jon Wakefield,et al.  Statistical methods for population pharmacokinetic modelling , 1998, Statistical methods in medical research.

[86]  D. Conti,et al.  Bayesian Modeling of Complex Metabolic Pathways , 2003, Human Heredity.

[87]  Duncan C Thomas,et al.  The use of hierarchical models for estimating relative risks of individual genetic variants: An application to a study of melanoma , 2008, Statistics in medicine.

[88]  M Alan Brookhart,et al.  Instrumental Variable Analysis of Secondary Pharmacoepidemiologic Data , 2006, Epidemiology.

[89]  D Spiegelman,et al.  Design of Validation Studies for Estimating the Odds Ratio of Exposure–Disease Relationships When Exposure Is Misclassified , 1999, Biometrics.

[90]  H. Nijhout,et al.  Mathematical Modeling of Folate Metabolism: Predicted Effects of Genetic Polymorphisms on Mechanisms and Biomarkers Relevant to Carcinogenesis , 2008, Cancer Epidemiology Biomarkers & Prevention.

[91]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[92]  Chris Gehring,et al.  A guide to the integrated application of on‐line data mining tools for the inference of gene functions at the systems level , 2008, Biotechnology journal.

[93]  P. Armitage,et al.  The age distribution of cancer and a multi-stage theory of carcinogenesis , 1954, British Journal of Cancer.

[94]  Jon Wakefield,et al.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies. , 2007, American journal of human genetics.

[95]  L. Stefanski,et al.  Instrumental Variable Estimation in Generalized Linear Measurement Error Models , 1996 .

[96]  F Y Bois,et al.  Applications of population approaches in toxicology. , 2001, Toxicology letters.

[97]  Jon Wakefield,et al.  Bayesian individualization via sampling-based methods , 1996, Journal of Pharmacokinetics and Biopharmaceutics.

[98]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[99]  S. Ebrahim,et al.  Mendelian randomization: prospects, potentials, and limitations. , 2004, International journal of epidemiology.

[100]  Gary E. Swan,et al.  Systematic biological prioritization after a genome-wide association study: an application to nicotine dependence , 2008, Bioinform..

[101]  Alice S. Whittemore,et al.  A Bayesian False Discovery Rate for Multiple Testing , 2007 .

[102]  N. Schork,et al.  Pathway analysis of seven common diseases assessed by genome-wide association. , 2008, Genomics.

[103]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[104]  P. Green Discussion of 'Bayesian image restoration with two applications in spatial statistics' by J Besag, J C York & A Mollie , 1991 .

[105]  R. Spielman,et al.  Natural variation in human gene expression assessed in lymphoblastoid cells , 2003, Nature Genetics.

[106]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[107]  D. Chasman On the utility of gene set methods in genomewide association studies of quantitative traits , 2008, Genetic epidemiology.

[108]  D. Maraganore,et al.  A Genomic Pathway Approach to a Complex Disease: Axon Guidance and Parkinson Disease , 2007, PLoS genetics.

[109]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[110]  R. Matthews,et al.  A candidate genetic risk factor for vascular disease: a common mutation in methylenetetrahydrofolate reductase , 1995, Nature Genetics.

[111]  J. E. Bennett,et al.  A comparison of a bayesian population method with two methods as implemented in commercially available software , 1996, Journal of Pharmacokinetics and Biopharmaceutics.

[112]  S. Hunt,et al.  Genome-Wide Associations of Gene Expression Variation in Humans , 2005, PLoS genetics.

[113]  Wiebe R. Pestman,et al.  Instrumental Variables: Application and Limitations , 2006, Epidemiology.

[114]  Gary K. Chen,et al.  Enriching the analysis of genomewide association studies with hierarchical modeling. , 2007, American journal of human genetics.

[115]  M. Gerstein,et al.  TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics. , 2004, Nucleic acids research.

[116]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[117]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.