The Possibilities of Filtering Pairs of SNPs in GWAS Studies - Exploratory Study on Public Protein-interaction and Pathway Data

Genome-wide association studies have become a standard way of discovering novel causative alleles by looking for statisticaly significant associations in patient genotyping data. The present challenge for these methods is to discover associations involving multiple interacting loci, a common phenomenon in diseases often related to epistasis. The main problem is the exponential increase in necessary computational power for every additional interacting locus considered in association tests. Several approaches have been proposed to manage this problem, including limiting analysis to interacting pairs and filtering SNPs according to external biological knowledge. Here we explore the possibilities of using public protein interaction data and pathway maps to filter out only pairs of SNPs that are likely to interact, perhaps because of epistatic mechanisms working at the protein level. After filtering all possible pairs of SNPs by their presence in common protein-protein interactions or proteins sharing a metabolic or signalling pathway, we calculate the possible reduction in computational requirements under different scenarios. We discuss these exploratory results in the context of the so-called "lost heredity" and the usefulness of this approach for similar scenarios.

[1]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[2]  R. Elston,et al.  Two-marker association tests yield new disease associations for coronary artery disease and hypertension , 2011, Human Genetics.

[3]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[4]  Taesung Park,et al.  A chi-square test for detecting multiple joint genetic variants in genome-wide association studies , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[5]  Michael K. Ng,et al.  Construction and analysis of genome-wide SNP networks , 2012, 2012 IEEE 6th International Conference on Systems Biology (ISB).

[6]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[7]  Hui Lin,et al.  Mining Functional Gene Modules Linked with Rheumatoid Arthritis Using a SNP-SNP Network , 2012, Genom. Proteom. Bioinform..

[8]  Kristel Van Steen,et al.  Travelling the world of gene-gene interactions – invited submission to Briefings in Bioinformatics , 2011 .

[9]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[10]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[11]  J. Hein,et al.  Using biological networks to search for interacting loci in genome-wide association studies , 2009, European Journal of Human Genetics.

[12]  J. Witte Genome-wide association studies and beyond. , 2010, Annual review of public health.

[13]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[14]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..

[15]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[16]  Marylyn D. Ritchie,et al.  Pacific Symposium on Biocomputing 14:368-379 (2009) BIOFILTER: A KNOWLEDGE-INTEGRATION SYSTEM FOR THE MULTI-LOCUS ANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES * , 2022 .

[17]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[18]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[19]  Muriel Médard,et al.  Network deconvolution as a general method to distinguish direct dependencies in networks , 2013, Nature Biotechnology.

[20]  Debasis Dash,et al.  HGVbaseG2P: a central genetic association database , 2008, Nucleic Acids Res..