SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation

MOTIVATION Single nucleotide polymorphisms (SNPs) analysis is an important means to study genetic variation. A fast and cost-efficient approach to identify large numbers of novel candidates is the SNP mining of large scale sequencing projects. The increasing availability of sequence trace data in public repositories makes it feasible to evaluate SNP predictions on the DNA chromatogram level. MAVIANT, a platform-independent Multipurpose Alignment VIewing and Annotation Tool, provides DNA chromatogram and alignment views and facilitates evaluation of predictions. In addition, it supports direct manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non-synonymous SNPs were analyzed for their potential effect on the protein structure/function using the PolyPhen and SIFT prediction programs. Predicted SNPs and annotations are stored in a web-based database. Using MAVIANT SNPs can visually be verified based on the DNA sequencing traces. A subset of candidate SNPs was selected for experimental validation by resequencing and genotyping. This study provides a web-based DNA chromatogram and contig browser that facilitates the evaluation and selection of candidate SNPs, which can be applied as genetic markers for genome wide genetic studies. AVAILABILITY The stand-alone version of MAVIANT program for local use is freely available under GPL license terms at http://snp.agrsci.dk/maviant. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  A. Vignal,et al.  A review on SNP and other types of molecular markers and their use in animal genetics , 2002, Genetics Selection Evolution.

[2]  Asger Hobolth,et al.  Comparative analysis of protein coding sequences from human, mouse and the domesticated pig , 2005, BMC Biology.

[3]  E. Birney,et al.  A SNP Map of the Rat Genome Generated from cDNA Sequences , 2004, Science.

[4]  J. Keele,et al.  Single nucleotide polymorphism (SNP) discovery in porcine expressed genes. , 2002, Animal genetics.

[5]  Richard J. Mural,et al.  Genome-wide single-nucleotide polymorphism analysis defines haplotype patterns in mouse , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Gabor T. Marth,et al.  A general approach to single-nucleotide polymorphism discovery , 1999, Nature Genetics.

[7]  K. Livak,et al.  Allelic discrimination using fluorogenic probes and the 5' nuclease assay. , 1999, Genetic analysis : biomolecular engineering.

[8]  Leif Andersson,et al.  Genetic dissection of phenotypic diversity in farm animals , 2001, Nature Reviews Genetics.

[9]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[10]  Kohei Suzuki,et al.  PEDE (Pig EST Data Explorer): construction of a database for ESTs derived from porcine full-length cDNA libraries , 2004, Nucleic Acids Res..

[11]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[12]  Eugene Berezikov,et al.  Single nucleotide polymorphisms associated with rat expressed sequences. , 2004, Genome research.

[13]  Lars Bolund,et al.  Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags , 2007, Genome Biology.

[14]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[15]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[16]  D. Nickerson,et al.  PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. , 1997, Nucleic acids research.

[17]  Heng Li,et al.  A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms. , 2004, Nature.

[18]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.

[19]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[20]  T Foitzi,et al.  Allelic discrimination using fluorogenic probes and the 5' nuclease assay , 1999 .

[21]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[22]  Wei Li,et al.  Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing , 2005, BMC Genomics.

[23]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[24]  Leif Andersson,et al.  Domestic-animal genomics: deciphering the genetics of complex traits , 2004, Nature Reviews Genetics.

[25]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[26]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.