ExTaxsI: an exploration tool of biodiversity molecular data

Background The increasing availability of multi omics data is leading to continually revise estimates of existing biodiversity data. In particular, the molecular data enable to characterize novel species yet unknown and to increase the information linked to those already observed with new genomic data. For this reason, the management and visualization of existing molecular data, and their related metadata, through the implementation of easy to use IT tools have become a key point for the development of future research. The more users are able to access biodiversity related information, the greater the ability of the scientific community to expand the knowledge in this area. Results In our research we have focused on the development of ExTaxsI (Exploring Taxonomies Information), an IT tool able to retrieve biodiversity data stored in NCBI databases and provide a simple and explorable visualization. Through the three case studies presented here, we have shown how an efficient organization of the data already present can lead to obtaining new information that is fundamental as a starting point for new research. Our approach was also able to highlight the limits in the distribution data availability, a key factor to consider in the experimental design phase of broad spectrum studies, such as metagenomics. Conclusions ExTaxI can easily produce explorable visualization of molecular data and its metadata, with the aim to help researchers to improve experimental designs and highlight the main gaps in the coverage of available data.

[1]  Md Saydur Rahman,et al.  Past, present, and future perspectives of environmental DNA (eDNA) metabarcoding: A systematic review in methods, monitoring, and applications of global eDNA , 2019, Global Ecology and Conservation.

[2]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[3]  J. Schultz,et al.  ITS2 Database V: Twice as Much. , 2015, Molecular biology and evolution.

[4]  Jewell D Washington,et al.  Potential use of DNA barcodes in regulatory science: applications of the Regulatory Fish Encyclopedia. , 2008, Journal of food protection.

[5]  R. Hanner,et al.  DNA barcoding detects market substitution in North American seafood , 2008 .

[6]  E. Cline Marketplace substitution of Atlantic salmon for Pacific salmon in Washington State detected by DNA barcoding , 2012 .

[7]  R. Henrik Nilsson,et al.  The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications , 2018, Nucleic Acids Res..

[8]  Benjamin J. Polacco,et al.  A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug-Repurposing , 2020, Nature.

[9]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[10]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11]  Inge Jonassen,et al.  The genome sequence of Atlantic cod reveals a unique immune system , 2011, Nature.

[12]  David B. Hopkins,et al.  Cod: A biography of the fish that changed the world , 1998 .

[13]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[14]  J. Pawlowski,et al.  Ecosystems monitoring powered by environmental genomics: A review of current strategies with an implementation roadmap , 2020, Molecular ecology.

[15]  S. Mariani,et al.  Smoke, mirrors, and mislabeled cod: poor transparency in the European seafood industry , 2010 .

[16]  Sara M. Handy,et al.  Development of a COX1 based PCR-RFLP method for fish species identification , 2015 .

[17]  E. Nielsen,et al.  Species-specific detection and quantification of environmental DNA from marine fishes in the Baltic Sea , 2019, Journal of Experimental Marine Biology and Ecology.

[18]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[19]  Dan-feng Li,et al.  Use of thiazide diuretics for the prevention of recurrent kidney calculi: a systematic review and meta-analysis , 2020, Journal of Translational Medicine.

[20]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[21]  I. Mafra,et al.  DNA barcoding coupled to HRM analysis as a new and simple tool for the authentication of Gadidae fish species. , 2017, Food chemistry.

[22]  G. Herrler,et al.  SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor , 2020, Cell.

[23]  Andrea Marzi,et al.  Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses , 2020, Nature Microbiology.

[24]  B. König‐Ries,et al.  Issues and Suggestions for the Development of a Biodiversity Data Visualization Support Tool , 2018, EuroVis.

[25]  G. Barlow,et al.  Fishes of the world , 2004, Environmental Biology of Fishes.

[26]  Terry. Grande,et al.  Fishes of the World: Nelson/Fishes of the World , 2016 .

[27]  Leiliang Zhang,et al.  SARS‐CoV‐2 spike protein favors ACE2 from Bovidae and Cricetidae , 2020, Journal of medical virology.

[28]  P. Hebert,et al.  Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[29]  Ethan P. White,et al.  Skills and Knowledge for Data-Intensive Environmental Research , 2017, Bioscience.

[30]  S. Sim,et al.  Genome Annotation Generator: a simple tool for generating and correcting WGS annotation tables for NCBI submission , 2018, GigaScience.

[31]  Alf Ring Kleiven,et al.  Who is fishing on what stock: population-of-origin of individual cod (Gadus morhua) in commercial and recreational fisheries , 2018, ICES Journal of Marine Science.

[32]  Robert D. Finn,et al.  MGnify: the microbiome analysis resource in 2020 , 2019, Nucleic Acids Res..

[33]  Sonja Hohlfeld,et al.  BCdatabaser: on-the-fly reference database creation for (meta-)barcoding , 2020, Bioinform..

[34]  F. Palombo,et al.  SARS-CoV-2 SPIKE PROTEIN: an optimal immunological target for vaccines , 2020, Journal of Translational Medicine.

[35]  Bert W. Hoeksema,et al.  Global Coordination and Standardisation in Marine Biodiversity through the World Register of Marine Species (WoRMS) and Related Databases , 2013, PloS one.

[36]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[37]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[38]  Kristy Deiner,et al.  Environmental DNA metabarcoding: Transforming how we survey animal and plant communities , 2017, Molecular ecology.

[39]  E. Bonerba,et al.  DNA barcoding for detecting market substitution in salted cod fillets and battered cod chunks. , 2013, Food chemistry.

[40]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[41]  Mehmet Ozaslan,et al.  Genomic characterization of a novel SARS-CoV-2 , 2020, Gene Reports.

[42]  N. Stenseth,et al.  Large-scale sequence analyses of Atlantic cod. , 2009, New biotechnology.

[43]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[44]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[45]  T. Pillay Gene of the month: the 2019-nCoV/SARS-CoV-2 novel coronavirus spike protein , 2020, Journal of Clinical Pathology.

[46]  Walter Pirovano,et al.  NCBI-compliant genome submissions: tips and tricks to save time and money , 2015, Briefings Bioinform..

[47]  R. S. Rasmussen,et al.  DNA-Based Methods for the Identification of Commercial Fish and Seafood Species. , 2008, Comprehensive reviews in food science and food safety.

[48]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[49]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[50]  Ralph S. Baric,et al.  Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus , 2020, Journal of Virology.

[51]  T. Rognes,et al.  Swarm v2: highly-scalable and high-resolution amplicon clustering , 2015, PeerJ.

[52]  Robert D. Finn,et al.  A new genomic blueprint of the human gut microbiota , 2019, Nature.

[53]  Ben Nichols,et al.  Distributed under Creative Commons Cc-by 4.0 Vsearch: a Versatile Open Source Tool for Metagenomics , 2022 .

[54]  Hyeshik Chang,et al.  The Architecture of SARS-CoV-2 Transcriptome , 2020, Cell.

[55]  Matthew B Jones,et al.  Ecoinformatics: supporting ecology as a data-intensive science. , 2012, Trends in ecology & evolution.

[56]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[57]  P. Bork,et al.  ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data , 2016, Molecular biology and evolution.

[58]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[59]  S. Dabravolski,et al.  SARS‐CoV‐2: Structural diversity, phylogeny, and potential animal host identification of spike glycoprotein , 2020, Journal of Medical Virology.

[60]  T. Porter,et al.  Scaling up: A guide to high‐throughput genomic approaches for biodiversity analysis , 2018, Molecular ecology.

[61]  Ethan P. White,et al.  Nine simple ways to make it easier to (re)use your data , 2013 .

[62]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[63]  Johan Bengtsson-Palme,et al.  metaxa2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data , 2015, Molecular ecology resources.

[64]  Rosalee S. Hellberg,et al.  Comparison of DNA Extraction and PCR Setup Methods for Use in High-Throughput DNA Barcoding of Fish Species , 2014, Food Analytical Methods.

[65]  Alex Hardisty,et al.  UvA-DARE ( Digital Academic Repository ) A decadal view of biodiversity informatics : challenges and priorities , 2013 .

[66]  Niklas Blomberg,et al.  Connecting data, tools and people across Europe: ELIXIR’s response to the COVID-19 pandemic , 2020, European Journal of Human Genetics.