MetacodeR: An R package for manipulation and heat tree visualization of community taxonomic data from metabarcoding

Community-level data, the type generated by an increasing number of metabarcoding studies, is often graphed as stacked bar charts or pie graphs; these graph types do not convey the hierarchical structure of taxonomic classifications and are limited by the use of color for categories. We developed MetacodeR, an R package for easily parsing, manipulating, and plotting hierarchical data. To accomplish this, MetacodeR provides a function to parse most text-based formats that contain taxonomic classifications, taxon names, taxon IDs, or sequence IDs. This parsed data can then be subset, sampled, and ordered using a set of intuitive functions that take into account the hierarchical nature of the data. Finally, an extremely flexible plotting function allows for the quantitative representation of up to 4 arbitrary statistics simultaneously in a tree format by mapping statistics to color and size of tree nodes and edges. MetacodeR also allows exploration of barcode primer bias by integrating functions to run digital PCR. MetacodeR has been designed for data from metabarcoding research, but can easily be applied to any data that has a hierarchical component such as gene ontology, gene expression data, or geographic location data. Our package complements currently available tools for community analysis and is provided open source with extensive online user manuals.

[1]  Stéphane Audic,et al.  The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy , 2012, Nucleic Acids Res..

[2]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[3]  Michael Weiss,et al.  Towards a unified paradigm for sequence‐based identification of fungi , 2013, Molecular ecology.

[4]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[5]  William A. Walters,et al.  Improved Bacterial 16S rRNA Gene (V4 and V4-5) and Fungal Internal Transcribed Spacer Marker Gene Primers for Microbial Community Surveys , 2015, mSystems.

[6]  Douglas W. Yu,et al.  Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring , 2012 .

[7]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[8]  William A. Walters,et al.  Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms , 2012, The ISME Journal.

[9]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[10]  Rob Knight,et al.  The Earth Microbiome project: successes and aspirations , 2014, BMC Biology.

[11]  P. Bork,et al.  Eukaryotic plankton diversity in the sunlit ocean , 2015, Science.

[12]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[13]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[14]  S. Tringe,et al.  Plant compartment and biogeography affect microbiome composition in cultivated and native Agave species , 2015, The New phytologist.

[15]  Ross A. Overbeek,et al.  The Ribosomal Database Project (RDP) , 1996, Nucleic Acids Res..

[16]  M. Cristescu,et al.  From barcoding single individuals to metabarcoding biological communities: towards an integrative approach to the study of global biodiversity. , 2014, Trends in ecology & evolution.