Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates

BackgroundA new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process.ResultsPrimer sequence is converted to a vector of the full potential hydrogen numbers (3 for G or C, 2 for A or T), while template sequence is converted to a vector of the actual hydrogen bond numbers formed after primer annealing. The former is considered as source information and the latter destination information. An information coefficient is calculated as a measure for fidelity of this information transfer process and thus a measure of similarity between primer and potential annealing site on template.ConclusionSuccessful prediction of PCR products from whole genomic sequences with a computer program based on the algorithm demonstrated the potential of this new algorithm in areas like in silico PCR and gene finding.

[1]  Paul C. Boutros,et al.  PUNS: transcriptomic- and genomic-in silico PCR for enhanced primer design , 2004, Bioinform..

[2]  Giorgio Valle,et al.  PRIMEX: rapid identification of oligonucleotide matches in whole genomes , 2003, Bioinform..

[3]  K Nishigaki,et al.  Whole genome sequence-enabled prediction of sequences performed for random PCR products of Escherichia coli. , 2000, Nucleic acids research.

[4]  Stephan Soullier,et al.  Further complexity of the human SOX gene family revealed by the combined use of highly degenerate primers and nested PCR , 1998, FEBS letters.

[5]  J. A. Comer,et al.  A novel coronavirus associated with severe acute respiratory syndrome. , 2003, The New England journal of medicine.

[6]  Hemant J. Purohit,et al.  Identification of signature and primers specific to genus Pseudomonas using mismatched patterns of 16S rDNA sequences , 2003, BMC Bioinformatics.

[7]  E. Rubin,et al.  A mathematical model and a computerized simulation of PCR using complex templates. , 1996, Nucleic acids research.

[8]  M. Gilson,et al.  The statistical-thermodynamic basis for computation of binding affinities: a critical review. , 1997, Biophysical journal.

[9]  Matej Lexa,et al.  Virtual PCR , 2001, Bioinform..

[10]  R. Fani,et al.  Identification of Azospirillum strains by restriction fragment length polymorphism of the 16S rDNA and of the histidine operon. , 1995, FEMS microbiology letters.

[11]  Alexander Schliep,et al.  Selecting signature oligonucleotides to identify organisms using DNA arrays , 2002, Bioinform..

[12]  W R Engels,et al.  Contributing software to the internet: the Amplify program. , 1993, Trends in biochemical sciences.

[13]  C. Stephensen,et al.  Phylogenetic analysis of a highly conserved region of the polymerase gene from 11 coronaviruses and development of a consensus polymerase chain reaction assay , 1999, Virus Research.

[14]  Gary D. Stormo,et al.  Selection of optimal DNA oligos for gene expression arrays , 2001, Bioinform..

[15]  Sven Rahmann,et al.  Algorithms for probe selection and DNA microarray design , 2004 .

[16]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .