Summarizing Specific Profiles in Illumina Sequencing from Whole-Genome Amplified DNA

Advances in both high-throughput sequencing and whole-genome amplification (WGA) protocols have allowed genomes to be sequenced from femtograms of DNA, for example from individual cells or from precious clinical and archived samples. Using the highly curated Caenorhabditis elegans genome as a reference, we have sequenced and identified errors and biases associated with Illumina library construction, library insert size, different WGA methods and genome features such as GC bias and simple repeat content. Detailed analysis of the reads from amplified libraries revealed characteristics suggesting that majority of amplified fragment ends are identical but inverted versions of each other. Read coverage in amplified libraries is correlated with both tandem and inverted repeat content, while GC content only influences sequencing in long-insert libraries. Nevertheless, single nucleotide polymorphism (SNP) calls and assembly metrics from reads in amplified libraries show comparable results with unamplified libraries. To utilize the full potential of WGA to reveal the real biological interest, this article highlights the importance of recognizing additional sources of errors from amplified sequence reads and discusses the potential implications in downstream analyses.

[1]  Timothy B. Stockwell,et al.  Mechanism of chimera formation during the Multiple Displacement Amplification reaction , 2007, BMC biotechnology.

[2]  Julian Parkhill,et al.  Whole-genome sequences of Chlamydia trachomatis directly from clinical samples without culture , 2013, Genome research.

[3]  Haiying Li Grunenwald,et al.  Rapid, high-throughput library preparation for next-generation sequencing , 2010 .

[4]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[5]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[6]  N. Carter,et al.  Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer. , 1992, Genomics.

[7]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.

[8]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[9]  A. Rechtsteiner,et al.  Broad chromosomal domains of histone modification patterns in C. elegans. , 2011, Genome research.

[10]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[11]  J. Neufeld,et al.  Something from (almost) nothing: the impact of multiple displacement amplification on microbial ecology , 2008, The ISME Journal.

[12]  Y. Li,et al.  Primase-based whole genome amplification , 2008, Nucleic acids research.

[13]  R. Durbin,et al.  Efficient de novo assembly of large genomes using compressed data structures. , 2012, Genome research.

[14]  Dieter Deforce,et al.  Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination , 2011, Nucleic acids research.

[15]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[16]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[17]  Cristel G. Thomas,et al.  Molecular hyperdiversity defines populations of the nematode Caenorhabditis brenneri , 2013, Proceedings of the National Academy of Sciences.

[18]  Z. Ning,et al.  Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes , 2009, Nature Methods.

[19]  Trevor L Hawkins,et al.  Whole genome amplification--applications and advances. , 2002, Current opinion in biotechnology.

[20]  Mark B Gerstein,et al.  Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing , 2006, BMC Genomics.

[21]  Tao Zhang,et al.  Generation of Long Insert Pairs Using a Cre-LoxP Inverse PCR Approach , 2012, PloS one.

[22]  Thomas M. Keane,et al.  An improved approach to mate-paired library preparation for Illumina sequencing , 2013 .

[23]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[24]  R. Hubert,et al.  Whole genome amplification from a single cell: implications for genetic analysis. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Mark Stitt,et al.  RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics , 2012, Nucleic Acids Res..

[26]  Fabian Grubert,et al.  A procedure for highly specific, sensitive, and unbiased whole-genome amplification , 2008, Proceedings of the National Academy of Sciences.

[27]  R. Lasken Genomic sequencing of uncultured microorganisms from single cells , 2012, Nature Reviews Microbiology.

[28]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[29]  Alejandro Llanos-Cuentas,et al.  Whole-genome sequencing and microarray analysis of ex vivo Plasmodium vivax reveal selective pressure on putative drug resistance genes , 2010, Proceedings of the National Academy of Sciences.

[30]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[31]  Gary Benson,et al.  Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. , 2004, Genome research.

[32]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[33]  Stephen R. Quake,et al.  Genome-wide Single-Cell Analysis of Recombination Activity and De Novo Mutation Rates in Human Sperm , 2012, Cell.

[34]  D. Kwiatkowski,et al.  Optimizing illumina next-generation sequencing library preparation for extremely at-biased genomes , 2012, BMC Genomics.

[35]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.