saskPrimer — An automated pipeline for design of intron-spanning PCR primers in non-model organisms

Robust and automated Polymerase Chain Reaction (PCR) primer design is an important pre-requisite to many strategies of large scale discovery of nucleotide variation, specifically Single Nucleotide Polymorphisms (SNPs). In many cases the design of PCR primers that amplify multiple members of gene families in complex genomes is complicated by the desire to design primers that amplify non-coding regions of the target organism's genome. This is especially complicated in organisms that do not have a fully sequenced genome, requiring further time intensive procedures. Thus, this phase of SNP discovery is often a bottle-neck for the overall process. In order to increase the efficiency of designing conserved intron-spanning gene family specific primers, an automated pipeline that streamlines the process by reducing the dependency on human participation was developed. The automated design process is proven to significantly reduce primer design time and human participation in comparison to the semi-automated approach employed previously. The increase in performance comes with a modest reduction in overall PCR efficiency but does not significantly reduce the total number of amplified PCR products. The pipeline was tested extensively using the target organism Brassica napus with the reference organism Arabidopsis thaliana, with an overall amplification success of 80.5% of the reference inputs.

[1]  C. Dean,et al.  Collinearity between a 30-centimorgan segment of Arabidopsis thaliana chromosome 4 and duplicated regions within the Brassica napus genome. , 1998, Genome.

[2]  Jakob Fredslund,et al.  A general pipeline for the development of anchor markers for comparative genomics in plants , 2006, BMC Genomics.

[3]  M F Rothschild,et al.  Expeditor: a pipeline for designing primers using human gene structure and livestock animal EST information. , 2005, The Journal of heredity.

[4]  Miftahudin,et al.  Development of an Expressed Sequence Tag (EST) Resource for Wheat (Triticum aestivum L.) , 2004, Genetics.

[5]  Jakob Fredslund,et al.  PHY·FI: fast and easy online creation and manipulation of phylogeny color figures , 2006, BMC Bioinformatics.

[6]  Erik L. L. Sonnhammer,et al.  Kalign – an accurate and fast multiple sequence alignment algorithm , 2005, BMC Bioinformatics.

[7]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[8]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[11]  Peter De Rijk,et al.  SNPbox: a modular software package for large-scale primer design , 2005, Bioinform..

[12]  C. Town Annotating the genome of Medicago truncatula. , 2006, Current opinion in plant biology.

[13]  Jakob Fredslund,et al.  PriFi: using a multiple alignment of related sequences to find primers for amplification of homologs , 2005, Nucleic Acids Res..

[14]  John Quackenbush,et al.  TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets , 2003, Bioinform..

[15]  A. Rafalski Applications of single nucleotide polymorphisms in crop genetics. , 2002, Current opinion in plant biology.