A Bioinformatic Platform for a Bayesian, Multiphased, Multilevel Analysis in Immunogenomics

The accumulation of electronically accessible data and knowledge are posing theoretical and practical challenges for study design and statistical data analysis. It consists of the use of the results of earlier high-throughput measurements of genetic variations, microRNA, and gene expression levels, and the use of the biological knowledge bases. We investigate fusion in the phases of study design, data analysis, and interpretation; specifically, we present methodologies and bioinformatic tools in the Bayesian framework to deepen, lengthen, and broaden this fusion. First, we overview a Bayesian decision support for design of partial genetic association studies (GASs) incorporating domain literature, knowledge bases, and results of analysis of earlier studies. Second, we present a Bayesian multilevel analysis (BMLA) for GAS, which performs an integrated analysis at the univariate and multivariate levels, and at the level of interactions. Third, we present a Bayesian logic to support interpretation, which integrates the results of data analysis and factual domain knowledge. Finally, we discuss the advantages of the Bayesian framework to cope with small sample size, fusion of data and knowledge, challenges of multiple testing, meta-analysis, and positive results bias (i.e., the communication of scientific uncertainty). The genomics of asthma will serve as an application domain.

[1]  A. Falus,et al.  Asthma from a pharmacogenomic point of view , 2008, British journal of pharmacology.

[2]  W. Gilks Markov Chain Monte Carlo , 2005 .

[3]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[4]  Jennifer Couzin,et al.  MicroRNAs Make Big Impression in Disease After Disease , 2008, Science.

[5]  Gonçalo R. Abecasis,et al.  Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma , 2007, Nature.

[6]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[7]  Douglas F. Levinson,et al.  QuickSNP: an automated web server for selection of tagSNPs , 2007, Nucleic Acids Res..

[8]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[9]  Steven Salzberg,et al.  Efficient decoding algorithms for generalized hidden Markov model gene finders , 2005, BMC Bioinformatics.

[10]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[11]  Xavier Estivill,et al.  Disorders: Filling the Gaps and Exploring Complexity in Genome-Wide Association Studies , 2022 .

[12]  Kenneth K. Kidd,et al.  HAPLOT: a graphical comparison of haplotype blocks, tagSNP sets and SNP variation for multiple populations , 2005, Bioinform..

[13]  Enrico Petretto,et al.  A gene harvest revealing the archeology and complexity of human disease , 2007, Nature Genetics.

[14]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[15]  Lon R. Cardon,et al.  GOLDsurfer: three dimensional display of linkage disequilibrium , 2004, Bioinform..

[16]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[17]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[18]  Xin Xu,et al.  SNPHunter: a bioinformatic software for single nucleotide polymorphism data acquisition and management , 2005, BMC Bioinformatics.

[19]  Michael Krawczak,et al.  GENOMIZER: an integrated analysis system for genome‐wide association data , 2006, Human mutation.

[20]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[21]  Margaret A. Pericak-Vance,et al.  SNPselector: a web tool for selecting SNPs for genetic association studies , 2005, Bioinform..

[22]  Bart De Moor,et al.  Using literature and data to learn Bayesian networks as clinical models of ovarian tumors , 2004, Artif. Intell. Medicine.

[23]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[24]  Francisco M. de la Vega,et al.  A Tool for Selecting SNPs for Association Studies Based on Observed Linkage Disequilibrium Patterns , 2005, Pacific Symposium on Biocomputing.

[25]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[26]  Mark Gerstein,et al.  Blurring the boundaries between the scientific 'papers' and biological databases , 2001 .

[27]  Hemant K Tiwari,et al.  Problems with Genome-Wide Association Studies , 2007, Science.

[28]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[29]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[30]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[31]  Christopher Holmes,et al.  Bayesian Methods for Nonlinear Classification and Regressing , 2002 .

[32]  Laura Inés Furlong,et al.  OSIRIS: a tool for retrieving literature about sequence variants , 2006, Bioinform..

[33]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[34]  Peter Antal,et al.  Learning Complex Bayesian Network Features for Classification , 2006, Probabilistic Graphical Models.

[35]  Peter Antal,et al.  Literature Mining using Bayesian Networks , 2006, Probabilistic Graphical Models.