Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis

Background The chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric sequence, there was only one research focusing on the structure and generation mechanism of chimeras, and it was merely based on hundreds of chimeras found in the sequence data of E. coli genome. Method We finished data mining towards a series of Next Generation Sequencing (NGS) reads which were used for whole genome haplotype assembling in a primary study. We established a bioinformatics pipeline based on subsection alignment strategy to discover all the chimeras inside and achieve their structural visualization. Then, we artificially defined two statistical indexes (the chimeric distance and the overlap length), and their regular abundance distribution helped illustrate of the structural characteristics of the chimeras. Finally we analyzed the relationship between the chimera type and the average insertion size, so that illustrate a method to decrease the proportion of wasted data in the procedure of DNA library construction. Results/Conclusion 131.4 Gb pair-end (PE) sequence data was reanalyzed for the chimeras. Totally, 40,259,438 read pairs (6.19%) with chimerism were discovered among 650,430,811 read pairs. The chimeric sequences are consisted of two or more parts which locate inconsecutively but adjacently on the chromosome. The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt. The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt. Moreover, unmapped chimeras, which were classified as the wasted data, could be reduced by properly increasing the length of the insertion segment size through a linear correlation analysis. Significance This study exhibited the profile of the phi29MDA chimeras by tens of millions of chimeric sequences, and helped understand the amplification mechanism of the phi29 DNA polymerase. Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information.

[1]  Zuhong Lu,et al.  High order intra-strand partial symmetry increases with organismal complexity in animal evolution , 2014, Scientific Reports.

[2]  M. Stoneking,et al.  A whole genome amplification method to generate long fragments from low quantities of genomic DNA. , 2002, Analytical biochemistry.

[3]  N. Carter,et al.  Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer. , 1992, Genomics.

[4]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[5]  P. Lizardi,et al.  Mutation detection and single-molecule counting using isothermal rolling-circle amplification , 1998, Nature Genetics.

[6]  Mostafa Ronaghi,et al.  Whole-genome haplotyping by dilution, amplification, and sequencing , 2013, Proceedings of the National Academy of Sciences.

[7]  F. Dean,et al.  Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. , 2001, Genome research.

[8]  Timothy B. Stockwell,et al.  Mechanism of chimera formation during the Multiple Displacement Amplification reaction , 2007, BMC biotechnology.

[9]  M. Van Ranst,et al.  Rolling-circle amplification of viral DNA genomes using phi29 polymerase. , 2009, Trends in microbiology.

[10]  P. Pevzner,et al.  Efficient de novo assembly of single-cell bacterial genomes from short-read data sets , 2011, Nature Biotechnology.

[11]  M. Lipinski,et al.  Multiple displacement amplification for complex mixtures of DNA fragments , 2008, BMC Genomics.

[12]  M. Salas,et al.  Involvement of the TPR2 subdomain movement in the activities of ϕ29 DNA polymerase , 2008, Nucleic acids research.

[13]  Dmitry Antipov,et al.  Assembling Single-Cell Genomes and Mini-Metagenomes From Chimeric MDA Products , 2013, J. Comput. Biol..

[14]  M. Salas,et al.  Involvement of phage ϕ29 DNA polymerase and terminal protein subdomains in conferring specificity during initiation of protein-primed DNA replication , 2007, Nucleic acids research.

[15]  J. Squire,et al.  The use of whole genome amplification in the study of human disease. , 2004, Progress in biophysics and molecular biology.

[16]  Rameen Beroukhim,et al.  Genome coverage and sequence fidelity of phi29 polymerase-based multiple strand displacement whole genome amplification. , 2004, Nucleic acids research.

[17]  J. Fuscoe,et al.  Characterization of whole genome amplified (WGA) DNA for use in genotyping assay development , 2012, BMC Genomics.

[18]  G. Church,et al.  Sequencing genomes from single cells by polymerase cloning , 2006, Nature Biotechnology.

[19]  J. Carrascosa,et al.  Active DNA unwinding dynamics during processive DNA replication , 2012, Proceedings of the National Academy of Sciences.