PDA: a pipeline to explore and estimate polymorphism in large DNA databases

Polymorphism studies are one of the main research areas of this genomic era. To date, however, no available web server or software package has been designed to automate the process of exploring and estimating nucleotide polymorphism in large DNA databases. Here, we introduce a novel software, PDA, Pipeline Diversity Analysis, that automatically can (i) search for polymorphic sequences in large databases, and (ii) estimate their genetic diversity. PDA is a collection of modules, mainly written in Perl, which works sequentially as follows: unaligned sequence retrieved from a DNA database are automatically classified by organism and gene, and aligned using the ClustalW algorithm. Sequence sets are regrouped depending on their similarity scores. Main diversity parameters, including polymorphism, synonymous and non-synonymous substitutions, linkage disequilibrium and codon bias are estimated both for the full length of the sequences and for specific functional regions. Program output includes a database with all sequences and estimations, and HTML pages with summary statistics, the performed alignments and a histogram maker tool. PDA is an essential tool to explore polymorphism in large DNA databases for sequences from different genes, populations or species. It has already been successfully applied to create a secondary database. PDA is available on the web at http://pda.uab.es/.

[1]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[2]  Xavier Messeguer,et al.  DnaSP, DNA polymorphism analyses by the coalescent and other methods , 2003, Bioinform..

[3]  F. Tajima The amount of DNA polymorphism maintained in a finite population when the neutral mutation rate varies among sites. , 1996, Genetics.

[4]  L D Stein Using Perl to facilitate biological analysis. , 2001, Methods of biochemical analysis.

[5]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[6]  David L. Wheeler,et al.  GenBank: update , 2004, Nucleic Acids Res..

[7]  Hilla Peretz,et al.  The , 1966 .

[8]  W. G. Hill,et al.  Linkage disequilibrium in finite populations , 1968, Theoretical and Applied Genetics.

[9]  Naoyuki Takahata Mechanisms of Molecular Evolution: Introduction to Molecular Paleopopulation Biology , 1993 .

[10]  M. Nei Molecular Evolutionary Genetics , 1987 .

[11]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[12]  F. Wright The 'effective number of codons' used in a gene. , 1990, Gene.

[13]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[14]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[15]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[16]  Paul M. Sharp,et al.  Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes , 1986, Nucleic Acids Res..

[17]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[18]  R. Lewontin,et al.  THE EVOLUTIONARY DYNAMICS OF COMPLEX POLYMORPHISMS , , , 1960 .

[19]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[20]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[21]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[22]  D C Shields,et al.  "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. , 1988, Molecular biology and evolution.

[23]  F. Collins,et al.  A vision for the future of genomics research , 2003, Nature.

[24]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[25]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[26]  J K Kelly,et al.  A test of neutrality based on interlocus associations. , 1997, Genetics.