Modeling of shotgun sequencing of DNA plasmids using experimental and theoretical approaches

Background Processing and analysis of DNA sequences obtained from next-generation sequencing (NGS) face some difficulties in terms of the correct prediction of DNA sequencing outcomes without the implementation of bioinformatics approaches. However, algorithms based on NGS perform inefficiently due to the generation of long DNA fragments, the difficulty of assembling them and the complexity of the used genomes. On the other hand, the Sanger DNA sequencing method is still considered to be the most reliable; it is a reliable choice for virtual modeling to build all possible consensus sequences from smaller DNA fragments. Results In silico and in vitro experiments were conducted: (1) to implement and test our novel sequencing algorithm, using the standard cloning vectors of different length and (2) to validate experimentally virtual shotgun sequencing using the PCR technique with the number of cycles from 1 to 9 for each reaction. Conclusions We applied a novel algorithm based on Sanger methodology to correctly predict and emphasize the performance of DNA sequencing techniques as well as in de novo DNA sequencing and its further application in synthetic biology. We demonstrate the statistical significance of our results. Graphical abstract

[1]  Z. Ning,et al.  Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes , 2009, Nature Methods.

[2]  Mark J. Clement,et al.  Targeted Amplicon Sequencing (TAS): A Scalable Next-Gen Approach to Multilocus, Multitaxa Phylogenetics , 2011, Genome biology and evolution.

[3]  J. T. Dunnen,et al.  Next generation sequencing technology: Advances and applications. , 2014, Biochimica et biophysica acta.

[4]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Smita Y. Patel,et al.  Sequencing of human genomes with nanopore technology , 2019, Nature Communications.

[6]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[7]  H. Yano,et al.  Diverse Broad-Host-Range Plasmids from Freshwater Carry Few Accessory Genes , 2013, Applied and Environmental Microbiology.

[8]  Schraga Schwartz,et al.  Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads , 2011, PloS one.

[9]  A. Theologis Goodbye to 'one by one' genetics , 2001, Genome Biology.

[10]  Mihai Pop,et al.  Using the TIGR assembler in shotgun sequencing projects. , 2004, Methods in molecular biology.

[11]  David Tse,et al.  Near-optimal assembly for shotgun sequencing with noisy reads , 2014, BMC Bioinformatics.

[12]  Darragh G. McArt,et al.  Validation of Next Generation Sequencing Technologies in Comparison to Current Diagnostic Gold Standards for BRAF, EGFR and KRAS Mutational Analysis , 2013, PloS one.

[13]  Cheng-Yao Chen DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present , 2014, Front. Microbiol..

[14]  Tsunglin Liu,et al.  Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly , 2013, PloS one.

[15]  J. Sullivan,et al.  Inferring the evolutionary history of IncP-1 plasmids despite incongruence among backbone gene trees. , 2013, Molecular biology and evolution.

[16]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[17]  Johnf . Thompson,et al.  The properties and applications of single-molecule DNA sequencing , 2011, Genome Biology.

[18]  Tao Hu,et al.  Analysis of Quasispecies of Avain Leukosis Virus Subgroup J Using Sanger and High-throughput Sequencing , 2016, Virology Journal.

[19]  Ye Yin,et al.  A comprehensive assessment of Next‐Generation Sequencing variants validation using a secondary technology , 2019, Molecular genetics & genomic medicine.

[20]  Qunfeng Dong,et al.  Tracembler – software for in-silico chromosome walking in unassembled genomes , 2007, BMC Bioinformatics.

[21]  S. Salzberg,et al.  An optimized protocol for analysis of EST sequences. , 2000, Nucleic acids research.

[22]  D. Bishop,et al.  An overview of technical considerations when using quantitative real-time PCR analysis of gene expression in human exercise research , 2018, PloS one.

[23]  M. Bhide,et al.  Rapid in vitro protein synthesis pipeline: a promising tool for cost-effective protein array design. , 2014, Molecular bioSystems.

[24]  T. A. Hall,et al.  BIOEDIT: A USER-FRIENDLY BIOLOGICAL SEQUENCE ALIGNMENT EDITOR AND ANALYSIS PROGRAM FOR WINDOWS 95/98/ NT , 1999 .

[25]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[26]  E. Top,et al.  Plasmid Detection, Characterization, and Ecology , 2015, Microbiology spectrum.

[27]  Julie D Thompson,et al.  Multiple Sequence Alignment Using ClustalW and ClustalX , 2003, Current protocols in bioinformatics.

[28]  M. Strous,et al.  The Binning of Metagenomic Contigs for Microbial Physiology of Mixed Cultures , 2012, Front. Microbio..

[29]  Guangbiao Zhou,et al.  Rapid Sanger Sequencing of the 16S rRNA Gene for Identification of Some Common Pathogens , 2014, PloS one.

[30]  Sebastian H. Eck,et al.  Diagnostic Applications of Next Generation Sequencing in Immunogenetics and Molecular Oncology , 2013, Transfusion Medicine and Hemotherapy.

[31]  Caroline Lieber,et al.  Understanding the Basics of NGS: From Mechanism to Variant Calling , 2015, Current Genetic Medicine Reports.

[32]  F. Sanger,et al.  A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. , 1975, Journal of molecular biology.

[33]  R. Durbin,et al.  A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. , 1995, Gene.

[34]  Patricia Rodriguez-Tomé,et al.  JESAM: CORBA software components to create and publish EST alignments and clusters , 2000, Bioinform..

[35]  Alejandro Ochoa,et al.  Supplementary Methods , 2005 .

[36]  C. Yeh,et al.  Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology. , 2018, Current issues in molecular biology.

[37]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[38]  A. Tretyn,et al.  Sequencing technologies and genome sequencing , 2011, Journal of Applied Genetics.

[39]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[40]  M. Márquez,et al.  Sanger sequencing as a first-line approach for molecular diagnosis of Andersen-Tawil syndrome , 2017, F1000Research.

[41]  Jesse A. Port,et al.  Indexed PCR Primers Induce Template-Specific Bias in Large-Scale DNA Sequencing Studies , 2016, PloS one.

[42]  Ryo Miyazaki,et al.  Community-wide plasmid gene mobilization and selection , 2012 .