A novice’s guide to analyzing NGS-derived organelle and metagenome data

Copyright © 2016 The Korean Society of Phycology 137 http://e-algae.org pISSN: 1226-2617 eISSN: 2093-0860 A novice’s guide to analyzing NGS-derived organelle and metagenome data Hae Jung Song, JunMo Lee, Louis Graf, Mina Rho, Huan Qiu, Debashish Bhattacharya and Hwan Su Yoon* Department of Biological Sciences, Sungkyunkwan University, Suwon 16419, Korea Division of Computer Science & Engineering, Hanyang University, Seoul 04763, Korea Department of Ecology, Evolution and Natural Resources, Rutgers University, New Brunswick, NJ 08901, USA

[1]  P. Martone,et al.  Evolution of Red Algal Plastid Genomes: Ancient Architectures, Introns, Horizontal Gene Transfer, and Taxonomic Utility of Plastid Markers , 2013, PloS one.

[2]  George M. Weinstock,et al.  Genomic approaches to studying the human microbiota , 2012, Nature.

[3]  Donald Sharon,et al.  A single-molecule long-read survey of the human transcriptome , 2013, Nature Biotechnology.

[4]  I-Min A. Chen,et al.  IMG/M: the integrated metagenome data management and comparative analysis system , 2011, Nucleic Acids Res..

[5]  M. Ronaghi,et al.  Real-time DNA sequencing using detection of pyrophosphate release. , 1996, Analytical biochemistry.

[6]  S. Mocali,et al.  Exploring research frontiers in microbiology: the challenge of metagenomics in soil microbiology. , 2010, Research in microbiology.

[7]  Vincent J. Magrini,et al.  Extending assembly of short DNA sequences to handle error , 2007, Bioinform..

[8]  Donald Sharon,et al.  Strain Kaplan of Pseudorabies Virus Genome Sequenced by PacBio Single-Molecule Real-Time Sequencing Technology , 2014, Genome Announcements.

[9]  S. Pääbo,et al.  Mitochondrial genome variation and the origin of modern humans , 2000, Nature.

[10]  J. Gilbert,et al.  Metagenomics - a guide from sampling to data analysis , 2012, Microbial Informatics and Experimentation.

[11]  S. Johnston,et al.  ORF-FINDER: a vector for high-throughput gene identification. , 2002, Gene.

[12]  Michael Maibaum,et al.  Survey of current protein family databases and their application in comparative, structural and functional genomics. , 2005, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[13]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[14]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[15]  J. Rothberg,et al.  The development and impact of 454 sequencing , 2008, Nature Biotechnology.

[16]  C. Jubin,et al.  Plastid genomes of two brown algae, Ectocarpus siliculosus and Fucus vesiculosus: further insights on the evolution of red-algal derived plastids , 2009, BMC Evolutionary Biology.

[17]  Yasubumi Sakakibara,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2012, Nucleic acids research.

[18]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[19]  Simon Swindell,et al.  Sequence Data Analysis Guidebook , 1996 .

[20]  F. Scholz,et al.  Comparative Analysis of Different DNA Extraction Protocols: A Fast, Universal Maxi-Preparation of High Quality Plant DNA for Genetic Evaluation and Phylogenetic Studies , 1998, Plant Molecular Biology Reporter.

[21]  H. Yoon,et al.  Unique repeat and plasmid sequences in the mitochondrial genome of Gracilaria chilensis (Gracilariales, Rhodophyta) , 2015 .

[22]  Chao Xie,et al.  A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA , 2013, Bioinform..

[23]  Dean Laslett,et al.  ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. , 2004, Nucleic acids research.

[24]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[25]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[26]  S. Y. Kim,et al.  Complete mitochondrial genome of the marine red alga Grateloupia angusta (Halymeniales) , 2014, Mitochondrial DNA.

[27]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[28]  T. Plasterer,et al.  SEQMAN. Contig assembly. , 1997, Methods in molecular biology.

[29]  Cecil M. Lewis,et al.  Ancient human microbiomes. , 2015, Journal of human evolution.

[30]  Thomas Wetter,et al.  Genome Sequence Assembly Using Trace Signals and Additional Sequence Information , 1999, German Conference on Bioinformatics.

[31]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[32]  Kenneth H. Wolfe,et al.  GenomeVx: simple web-based creation of editable circular chromosome maps , 2008, Bioinform..

[33]  Mark J. P. Chaisson,et al.  Reconstructing complex regions of genomes using long-read sequencing technology , 2014, Genome research.

[34]  Konrad H. Paszkiewicz,et al.  De novo assembly of short sequence reads , 2010, Briefings Bioinform..

[35]  Alla Lapidus,et al.  A Bioinformatician's Guide to Metagenomics , 2008, Microbiology and Molecular Biology Reviews.

[36]  R Higuchi,et al.  Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material. , 2013, BioTechniques.

[37]  Xiaojun Guan,et al.  CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences , 2012, BMC Genomics.

[38]  Ralph Bock,et al.  OrganellarGenomeDRAW (OGDRAW): a tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes , 2007, Current Genetics.

[39]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[40]  F. Raymond,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Ray Meta: scalable de novo metagenome assembly and profiling , 2012 .

[41]  François Laviolette,et al.  Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies , 2010, J. Comput. Biol..

[42]  Luis Pedro Coelho,et al.  Structure and function of the global ocean microbiome , 2015, Science.

[43]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[44]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[45]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[46]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[47]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[48]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[49]  Tibor Vellai,et al.  A New Aspect to the Origin and Evolution of Eukaryotes , 1998, Journal of Molecular Evolution.

[50]  Riccardo Velasco,et al.  An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome , 2013, BMC Genomics.

[51]  R. Vaillancourt,et al.  Maternal inheritance of the chloroplast genome in Eucalyptus globulus and interspecific hybrids. , 2001, Genome.

[52]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[53]  M. Ludwig,et al.  Transcription Profiling of the Model Cyanobacterium Synechococcus sp. Strain PCC 7002 by Next-Gen (SOLiD™) Sequencing of cDNA , 2011, Front. Microbio..

[54]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[55]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[56]  A. Weber,et al.  Adaptation through horizontal gene transfer in the cryptoendolithic red alga Galdieria phlegrea , 2013, Current Biology.

[57]  Mark J. P. Chaisson,et al.  Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[58]  B. Lang,et al.  Mitochondrial introns: a critical view. , 2007, Trends in genetics : TIG.

[59]  Ying Huang,et al.  Bioinformatics Applications Note Identification of Ribosomal Rna Genes in Metagenomic Fragments , 2022 .

[60]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[61]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[62]  Caspar Zialor DNA sequencing with chain terminating inhibitors , 2014 .

[63]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[64]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[65]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[66]  J. Badge DNA sequencing. , 1998, Methods in molecular biology.

[67]  Robert K. Jansen,et al.  Automatic annotation of organellar genomes with DOGMA , 2004, Bioinform..

[68]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[69]  Y. Hahn,et al.  Metagenomic Analysis of Kimchi, a Traditional Korean Fermented Food , 2011, Applied and Environmental Microbiology.

[70]  P. Chain,et al.  Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. , 2012, Current opinion in biotechnology.

[71]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[72]  P. Stadler,et al.  MITOS: improved de novo metazoan mitochondrial genome annotation. , 2013, Molecular phylogenetics and evolution.

[73]  M. Salimans,et al.  Rapid and simple method for purification of nucleic acids , 1990, Journal of clinical microbiology.

[74]  David Hernández,et al.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[75]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[76]  J. Humbert,et al.  Metagenomic approach studying the taxonomic and functional diversity of the bacterial community in a mesotrophic lake (Lac du Bourget--France). , 2009, Environmental microbiology.

[77]  S. Y. Kim,et al.  Highly Conserved Mitochondrial Genomes among Multicellular Red Algae of the Florideophyceae , 2015, Genome biology and evolution.

[78]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[79]  Yongchao Liu,et al.  Parallelized short read assembly of large genomes using de Bruijn graphs , 2011, BMC Bioinformatics.

[80]  N. Brisson,et al.  Recombination and the maintenance of plant organelle genome stability. , 2010, The New phytologist.

[81]  Andreas Wilke,et al.  Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG , 2011, BMC Bioinformatics.

[82]  Robert Olson,et al.  Accessing the SEED Genome Databases via Web Services API: Tools for Programmers , 2010, BMC Bioinformatics.

[83]  C. Gaillard,et al.  Ethanol precipitation of DNA with linear polyacrylamide as carrier. , 1990, Nucleic acids research.

[84]  B. Gravendeel,et al.  The Complete Chloroplast Genome of 17 Individuals of Pest Species Jacobaea vulgaris: SNPs, Microsatellites and Barcoding Markers for Population and Phylogenetic Studies , 2011, DNA research : an international journal for rapid publication of reports on genes and genomes.

[85]  W. Martin,et al.  The hydrogen hypothesis for the first eukaryote , 1998, Nature.

[86]  Yongan Zhao,et al.  RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data , 2011, Bioinform..

[87]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[88]  E. Yang,et al.  Reconstructing the complex evolutionary history of mobile plasmids in red algal genomes , 2016, Scientific Reports.

[89]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[90]  Shulei Sun,et al.  Prokaryotic Genomes and Diversity in Surface Ocean Waters: Interrogating the Global Ocean Sampling Metagenome , 2009, Applied and Environmental Microbiology.

[91]  R. E. Lacey,et al.  Algal cell rupture using high pressure homogenization as a prelude to oil extraction , 2012 .

[92]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[93]  Daniel H Huson,et al.  Microbial community analysis using MEGAN. , 2013, Methods in enzymology.

[94]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[95]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[96]  Bertil Schmidt,et al.  A fast hybrid short read fragment assembly algorithm , 2009, Bioinform..

[97]  D. Antonopoulos,et al.  Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. , 2010, Cold Spring Harbor protocols.

[98]  H. Oh,et al.  Comparison of several methods for effective lipid extraction from microalgae. , 2010, Bioresource technology.

[99]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[100]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[101]  Bahlul Haider,et al.  Omega: an Overlap-graph de novo Assembler for Metagenomics , 2014, Bioinform..

[102]  Sarah McCalmon,et al.  Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene , 2013, Genome research.

[103]  A. Sherwood,et al.  A molecular method for identification of the morphologically plastic invasive algal genera Eucheuma and Kappaphycus (Rhodophyta, Gigartinales) in Hawaii , 2009, Journal of Applied Phycology.

[104]  E. Yang,et al.  Evidence of ancient genome reduction in red algae (Rhodophyta) , 2015, Journal of phycology.

[105]  Debashish Bhattacharya,et al.  Applications of next-generation sequencing to unravelling the evolutionary history of algae. , 2014, International journal of systematic and evolutionary microbiology.

[106]  Björn Canbäck,et al.  ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences , 2008, Bioinform..