PolyMAPr: Programs for polymorphism database mining, annotation, and functional analysis

Pharmacogenomic and disease‐association studies rely on identifying a comprehensive set of polymorphisms within candidate genes. Public SNP databases are a rich source of polymorphism data, but mining them effectively requires overcoming at least four challenges: ensuring accurate annotations for genes and polymorphisms, eliminating both inter‐ and intra‐database redundancy, integrating data from multiple public sources with data generated locally, and prioritizing the variants for further study. PolyMAPr (Polymorphism Mining and Annotation Programs)' was developed to overcome these challenges and to improve the efficiency of database mining and polymorphism annotation. PolyMAPr takes as input a file containing a list of genes to be processed and files containing each annotated gene sequence. Polymorphic sequences obtained from public databases (dbSNP, CGAP, and JSNP) or through local SNP discovery efforts, as well as oligonucleotide sequences (e.g., PCR primers), are mapped to the annotated gene sequences and named according to suggested nomenclature guidelines. The functional effects of nonsynonymous coding‐region SNPs (cSNPs) and any variants that might alter exon splicing enhancer (ESE) sites, putative transcription factor binding sites, or intron–exon splice sites are predicted. The output files are accessible though a browser interface. In addition, the results are also provided in Extensible Markup Language (XML) format to facilitate uploading them into a local relational database. PolyMAPr increases the efficiency of mining public databases for genetic variants within candidate genes and provides a mechanism by which data from multiple sources (both public and private) can be uniformly integrated, thereby significantly reducing the effort required to obtain a comprehensive set of polymorphisms for pharmacogenomic and disease‐association studies. PolyMAPr can be obtained from http://pharmacogenomics.wustl.edu. Hum Mutat 25:110–117, 2005. © 2005 Wiley‐Liss, Inc.

[1]  H. McLeod Drug pathways: moving beyond single gene pharmacogenetics. , 2004, Pharmacogenomics.

[2]  L. Bracco,et al.  The relevance of alternative RNA splicing to pharmacogenomics. , 2003, Trends in biotechnology.

[3]  Chunyu Liu,et al.  DNannotator: annotation software tool kit for regional genomic sequences , 2003, Nucleic Acids Res..

[4]  Jinhua Wang,et al.  ESEfinder: a web resource to identify exonic splicing enhancers , 2003, Nucleic Acids Res..

[5]  R. Weinshilboum Inheritance and drug response. , 2003, The New England journal of medicine.

[6]  Howard L McLeod,et al.  Pharmacogenomics--drug disposition, drug targets, and side effects. , 2003, The New England journal of medicine.

[7]  Alberto Riva,et al.  SNPper: retrieval and analysis of human SNPs , 2002, Bioinform..

[8]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[9]  Sharon Marsh,et al.  SNP databases and pharmacogenetics: great start, but a long way to go , 2002, Human mutation.

[10]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[11]  A. Holden,et al.  The SNP consortium: summary of a private consortium effort to develop an applied map of the human genome. , 2002, BioTechniques.

[12]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[13]  A. Krainer,et al.  Listening to silence and understanding nonsense: exonic mutations that affect splicing , 2002, Nature Reviews Genetics.

[14]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[15]  N. Murata,et al.  Identification of 142 single nucleotide polymorphisms in 41 candidate genes for rheumatoid arthritis in the Japanese population , 2000, Human Genetics.

[16]  R. Yamada,et al.  Identification of 187 single nucleotide polymorphisms (SNPs) among 41 candidate genes for ischemic heart disease in the Japanese population , 2000, Human Genetics.

[17]  S. Antonarakis,et al.  Mutation nomenclature extensions and suggestions to describe complex mutations: A discussion , 2000 .

[18]  Michael N. Edmonson,et al.  Reliable identification of large numbers of candidate SNPs from public EST data , 1999, Nature Genetics.

[19]  L. Brooks,et al.  A DNA polymorphism discovery resource for research on human genetic variation. , 1998, Genome research.

[20]  D. Nickerson,et al.  PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. , 1997, Nucleic acids research.

[21]  Eugene W. Myers,et al.  A sublinear algorithm for approximate keyword searching , 1994, Algorithmica.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[24]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[25]  Thangavel Alphonse Thanaraj,et al.  ASD: the Alternative Splicing Database , 2004, Nucleic Acids Res..

[26]  Yan P. Yuan,et al.  HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources , 2002, Nucleic Acids Res..

[27]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[28]  Elizabeth M. Smigielski,et al.  dbSNP: a database of single nucleotide polymorphisms , 2000, Nucleic Acids Res..

[29]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[30]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .