High performance computing of oligopeptides complete backtranslation applied to DNA microarray probe design

Complete backtranslation is the step of generating all possible nucleic acid sequences from a protein sequence. This is a time‐consuming task that can provide unreasonable quantities of data. Complete backtranslation was recently used to initiate probe design for functional DNA microarrays from conserved peptidic regions, in order to assess the full microbial gene diversity present in complex environments. In this article, we present an efficient parallelization method to compute a complete backtranslation of short peptides to select probes for functional microarrays. We implemented a software that uses meta‐programming and a model‐driven engineering approach to automatically generate source codes to perform complete backtranslation on different architectures: PCs, Symmetric Multiprocessors servers, computing clusters, or a computing grid. Our software is filtering the generated oligonucleotides with usual selection criteria for the design of microarray probes. It uses load balancing and can be easily integrated in probe design software for functional microarrays. We present its performance on both simulated and real biological datasets. The obtained results show a significant computing speedup on different platforms and an important gain of about 40% of disk space when filtering oligonucleotides. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  P Stothard,et al.  The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. , 2000, BioTechniques.

[2]  Anders Gorm Pedersen,et al.  RevTrans: multiple alignment of coding DNA from aligned amino acid sequences , 2003, Nucleic Acids Res..

[3]  David R. C. Hill,et al.  MetaExploArrays: A Large-Scale Oligonucleotide Probe Design Software for Explorative DNA Microarrays , 2012, 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[4]  Andrés Moreira,et al.  Genetic algorithms for the imitation of genomic styles in protein backtranslation , 2003, Theor. Comput. Sci..

[5]  Jiasen Lu,et al.  Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. , 2000, Nucleic acids research.

[6]  Gregory Kucherov,et al.  Back-translation for discovering distant protein homologies in the presence of frameshift mutations , 2010, Algorithms for Molecular Biology.

[7]  M. Missaoui,et al.  Complete Backtranslation of Oligopeptides for Metabolic Pathways Exploration of Complex Environments using Functional Microarrays , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[8]  William Seffens,et al.  Using a neural network to backtranslate amino acid sequences , 1998 .

[9]  Murray Wolinsky,et al.  Response to Comment by Volkov et al. on "Computational Improvements Reveal Great Bacterial Diversity and High Metal Toxicity in Soil" , 2006, Science.

[10]  C. Mora,et al.  How Many Species Are There on Earth and in the Ocean? , 2011, PLoS biology.

[11]  David R. C. Hill,et al.  PhylOPDb: a 16S rRNA oligonucleotide probe database for prokaryotic identification , 2014, Database J. Biol. Databases Curation.

[12]  W. Whitman,et al.  Prokaryotes: the unseen majority. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  K. Schleifer,et al.  Phylogenetic identification and in situ detection of individual microbial cells without cultivation. , 1995, Microbiological reviews.

[14]  Thomas P. Curtis,et al.  Estimating prokaryotic diversity and its limits , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. H. Nash A computer program to calculate and design oligonucleotide primers from amino acid sequences , 1993, Comput. Appl. Biosci..

[16]  E. Dugat-Bony,et al.  Detecting unknown sequences with DNA microarrays: explorative probe design strategies. , 2012, Environmental microbiology.

[17]  Marshall Nirenberg,et al.  Historical review: Deciphering the genetic code--a personal account. , 2004, Trends in biochemical sciences.

[18]  Alejandro Maass,et al.  TIP: protein backtranslation aided by genetic algorithms , 2004, Bioinform..

[19]  Xiao Sun,et al.  Cluster analysis of the codon use frequency of MHC genes from different species. , 2002, Bio Systems.

[20]  David R. C. Hill,et al.  A comparison of algorithms for a complete backtranslation of oligopeptides , 2008, Int. J. Comput. Biol. Drug Des..

[21]  David S. Johnson,et al.  Fast Algorithms for Bin Packing , 1974, J. Comput. Syst. Sci..

[22]  Eric Peyretaillade,et al.  Detecting variants with Metabolic Design, a new software tool to design probes for explorative functional DNA microarray development , 2010, BMC Bioinformatics.

[23]  N. Sugimoto,et al.  Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. , 1996, Nucleic acids research.

[24]  David R. C. Hill,et al.  PRT: Parallel program for a full backtranslation of oligopeptides , 2007 .

[25]  A. Cornish-Bowden Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. , 1985, Nucleic acids research.

[26]  Timothy Rose,et al.  CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design , 2003, Nucleic Acids Res..

[27]  G Pesole,et al.  A backtranslation method based on codon usage strategy. , 1988, Nucleic acids research.

[28]  C. Boivin-Masson,et al.  Microarray-Based Detection and Typing of the Rhizobium Nodulation Gene nodC: Potential of DNA Arrays To Diagnose Biological Functions of Interest , 2005, Applied and Environmental Microbiology.