SNPRanker: a tool for identification and scoring of SNPs associated to target genes

The identification of genes and SNPs involved in human diseases remains a challenge. Many public resources, databases and applications, collect biological data and perform annotations, increasing the global biological knowledge. The need of SNPs prioritization is emerging with the development of new high-throughput genotyping technologies, which allow to develop customized disease-oriented chips. Therefore, given a list of genes related to a specific biological process or disease as input, a crucial issue is finding the most relevant SNPs to analyse. The selection of these SNPs may rely on the relevant a-priori knowledge of biomolecular features characterising all the annotated SNPs and genes of the provided list. The bioinformatics approach described here allows to retrieve a ranked list of significant SNPs from a set of input genes, such as candidate genes associated with a specific disease. The system enriches the genes set by including other genes, associated to the original ones by ontological similarity evaluation. The proposed method relies on the integration of data from public resources in a vertical perspective (from genomics to systems biology data), the evaluation of features from biomolecular knowledge, the computation of partial scores for SNPs and finally their ranking, relying on their global score. Our approach has been implemented into a web based tool called SNPRanker, which is accessible through at the URL http://www.itb.cnr.it/snpranker . An interesting application of the presented system is the prioritisation of SNPs related to genes involved in specific pathologies, in order to produce custom arrays.

[1]  D. Allison,et al.  Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles , 2008, Human Heredity.

[2]  James C. Hu,et al.  The Gene Ontology’s Reference Genome Project: A Unified Framework for Functional Annotation across Species , 2009 .

[3]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[4]  Natalie Wilson,et al.  Human Protein Reference Database , 2004, Nature Reviews Molecular Cell Biology.

[5]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[6]  Marc A. Martí-Renom,et al.  Characterization of Protein Hubs by Inferring Interacting Motifs from Protein Interactions , 2007, PLoS Comput. Biol..

[7]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[8]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[9]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[10]  P. Sham,et al.  Application of genome-wide SNP data for uncovering pairwise relationships and quantitative trait loci , 2009, Genetica.

[11]  M. Daly,et al.  Genetic Mapping in Human Disease , 2008, Science.

[12]  Fan Zhang,et al.  A Heuristic Approach for Target SNP Mining Based on Genome-Wide IBD Profile , 2007, Third International Conference on Natural Computation (ICNC 2007).

[13]  L. Hamel,et al.  Unsupervised Learning in Detection of Gene Transfer , 2008, Journal of biomedicine & biotechnology.

[14]  Ren Zhang,et al.  DEG: a database of essential genes. , 2004, Nucleic acids research.

[15]  Claire Infante-Rivard,et al.  Combining case-control and case-trio data from the same population in genetic association analyses: overview of approaches and illustration with a candidate gene study. , 2009, American journal of epidemiology.

[16]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[17]  Lincoln Stein,et al.  Reactome knowledgebase of human biological pathways and processes , 2008, Nucleic Acids Res..

[18]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[19]  Elizabeth M. Smigielski,et al.  dbSNP: a database of single nucleotide polymorphisms , 2000, Nucleic Acids Res..

[20]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[21]  Kiyoko F. Aoki-Kinoshita,et al.  Gene annotation and pathway mapping in KEGG. , 2007, Methods in molecular biology.

[22]  David B. Goldstein,et al.  Genomics: Understanding human diversity , 2005, Nature.

[23]  Seth G. N. Grant,et al.  The Role of DNA Copy Number Variation in Schizophrenia , 2009, Biological Psychiatry.

[24]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[25]  Heping Zhang,et al.  Guideline for data analysis of genomewide association studies. , 2007, Cancer genomics & proteomics.

[26]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[27]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[28]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[29]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[30]  Lewis Y. Geer,et al.  Database resources of the National Center for Biotechnology Information , 2014, Nucleic Acids Res..

[31]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[32]  W. Hanage,et al.  Methods for data analysis. , 2009, Methods in molecular biology.