PineSAP—sequence alignment and SNP identification pipeline

Summary: The Pine Alignment and SNP Identification Pipeline (PineSAP) provides a high-throughput solution to single nucleotide polymorphism (SNP) prediction using multiple sequence alignments from re-sequencing data. This pipeline integrates a hybrid of customized scripting, existing utilities and machine learning in order to increase the speed and accuracy of SNP calls. The implementation of this pipeline results in significantly improved multiple sequence alignments and SNP identifications when compared with existing solutions. The use of machine learning in the SNP identifications extends the pipeline's application to any eukaryotic species where full genome sequence information is unavailable. Availability: All code used for this pipeline is freely available at the Dendrome project website (http://dendrome.ucdavis.edu/adept2/resequencing.html) Contact: jlwegrzyn@ucdavis.edu

[1]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[2]  Hyrum Carroll,et al.  DNA reference alignment benchmarks based on tertiary structure of encoded proteins , 2007, Bioinform..

[3]  Gabor T. Marth,et al.  A general approach to single-nucleotide polymorphism discovery , 1999, Nature Genetics.

[4]  Per Unneberg,et al.  SNP discovery using advanced algorithms and neural networks , 2005, Bioinform..

[5]  W. H. Lee,et al.  Heterogeneity detector: finding heterogeneous positions in Phred/Phrap assemblies , 2004, Bioinform..

[6]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[7]  David B Neale,et al.  Genomics to tree breeding and forest health. , 2007, Current opinion in genetics & development.

[8]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[9]  John J. Grefenstette,et al.  Application of machine learning in SNP discovery , 2006, BMC Bioinformatics.

[10]  D. Nickerson,et al.  PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. , 1997, Nucleic acids research.

[11]  Yagang Zhang,et al.  Application of Machine Learning , 2010 .

[12]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[13]  Andreas Wilm,et al.  An enhanced RNA alignment benchmark for sequence alignment programs , 2006, Algorithms for Molecular Biology.