Is reliance on an inaccurate genome sequence sabotaging your experiments?

Advances in genomics have made whole genome studies increasingly feasible across the life sciences. However, new technologies and algorithmic advances do not guarantee flawless genomic sequences or annotation. Bias, errors, and artifacts can enter at any stage of the process from library preparation to annotation. When planning an experiment that utilizes a genome sequence as the basis for the design, there are a few basic checks that, if performed, may better inform the experimental design and ideally help avoid a failed experiment or inconclusive result.

[1]  Haiming Wang,et al.  EuPathDB: the eukaryotic pathogen genomics database resource , 2016, Nucleic Acids Res..

[2]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[3]  Sofia M. C. Robb,et al.  MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. , 2007, Genome research.

[4]  Joseph D. Smith,et al.  Antigenic Variation in Plasmodium falciparum: Gene Organization and Regulation of the var Multigene Family , 2007, Eukaryotic Cell.

[5]  Bo Wang,et al.  Gramene 2018: unifying comparative genomics and pathway resources for plant research , 2017, Nucleic Acids Res..

[6]  L. S. Swapna,et al.  Local admixture of amplified and diversified secreted pathogenesis determinants shapes mosaic Toxoplasma gondii genomes , 2016, Nature Communications.

[7]  Joonhong Park,et al.  Characterization of sequence-specific errors in various next-generation sequencing systems. , 2016, Molecular bioSystems.

[8]  E. Mardis Next-generation sequencing platforms. , 2013, Annual review of analytical chemistry.

[9]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[10]  Samuel A. Assefa,et al.  Culture adaptation of malaria parasites selects for convergent loss-of-function mutants , 2017, Scientific Reports.

[11]  Thomas Hackl,et al.  proovread: large-scale high-accuracy PacBio correction through iterative short read consensus , 2014, Bioinform..

[12]  Gregory A. Buck,et al.  The genome of Cryptosporidium hominis , 2004, Nature.

[13]  Ping Xu,et al.  Complete Genome Sequence of the Apicomplexan, Cryptosporidium parvum , 2004, Science.

[14]  F. van Nieuwerburgh,et al.  Library construction for next-generation sequencing: overviews and challenges. , 2014, BioTechniques.

[15]  James Ostell,et al.  The Genome Assembly Archive: A New Public Resource , 2004, PLoS biology.

[16]  Eric S. Lander,et al.  Hi-C: A Method to Study the Three-dimensional Architecture of Genomes. , 2010, Journal of visualized experiments : JoVE.

[17]  Jonathan M. Cairns,et al.  CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data , 2015, Genome Biology.

[18]  D. Horn Antigenic variation in African trypanosomes , 2014, Molecular and biochemical parasitology.

[19]  Walter Pirovano,et al.  SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information , 2014, BMC Bioinformatics.

[20]  Sandra Gesing,et al.  VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases , 2014, Nucleic Acids Res..

[21]  D. Schwartz,et al.  Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. , 1993, Science.

[22]  C. Mazzoni,et al.  A new hybrid approach for MHC genotyping: high-throughput NGS and long read MinION nanopore sequencing, with application to the non-model vertebrate Alpine chamois (Rupicapra rupicapra) , 2018, Heredity.

[23]  Matthew Berriman,et al.  Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology , 2010, Bioinform..

[24]  Guy Robinson,et al.  Generation of whole genome sequences of new Cryptosporidium hominis and Cryptosporidium parvum isolates directly from stool samples , 2015, BMC Genomics.

[25]  Rodrigo Lopez,et al.  Assembly information services in the European Nucleotide Archive , 2013, Nucleic Acids Res..

[26]  B. Haas,et al.  The Genome Sequence of Trypanosoma cruzi, Etiologic Agent of Chagas Disease , 2005, Science.

[27]  J. Leamon,et al.  Bias in Whole Genome Amplification: Causes and Considerations. , 2015, Methods in molecular biology.

[28]  Kami Kim,et al.  Toxoplasma gondii sequesters centromeres to a specific nuclear region throughout the cell cycle , 2011, Proceedings of the National Academy of Sciences.

[29]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[30]  Haiming Wang,et al.  GeneDB—an annotation database for pathogens , 2011, Nucleic Acids Res..

[31]  Timothy P. L. Smith,et al.  Reducing assembly complexity of microbial genomes with single-molecule sequencing , 2013, Genome Biology.

[32]  G. Buck,et al.  Revisiting the reference genomes of human pathogenic Cryptosporidium species: reannotation of C. parvum Iowa and a new C. hominis reference , 2015, Scientific Reports.

[33]  Yazhu Chen,et al.  A Brief Review of Computational Gene Prediction Methods , 2004, Genomics, proteomics & bioinformatics.

[34]  Kathryn E. McGovern,et al.  Comparative 3D genome organization in apicomplexan parasites , 2019, Proceedings of the National Academy of Sciences.

[35]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[36]  S. Magadum,et al.  Gene duplication as a major force in evolution , 2013, Journal of Genetics.

[37]  Deanna M. Church,et al.  Assembly: a resource for assembled genomes at NCBI , 2015, Nucleic Acids Res..