Bio-TDS: bioscience query tool discovery system

Abstract Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 15 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS’s scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on biological data analysis. The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process.

[1]  Christian Brueffer,et al.  TopHat-Recondition: a post-processor for TopHat unmapped reads , 2016, BMC Bioinformatics.

[2]  Karin M. Verspoor,et al.  BioC: a minimalist approach to interoperability for biomedical text processing , 2013, AMIA.

[3]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[4]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[5]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[6]  Dean Giustini,et al.  Comparing the coverage, recall, and precision of searches for 120 systematic reviews in Embase, MEDLINE, and Google Scholar: a prospective study , 2016, Systematic Reviews.

[7]  James Cheney,et al.  Curated databases , 2008, PODS.

[8]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[9]  Stephen Chen,et al.  Towards the automated design of phased array ultrasonic transducers: Using particle swarms to find "smart" start points , 2007 .

[10]  Dan M. Bolser,et al.  The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis , 2011, Nucleic Acids Res..

[11]  Goran Nenadic,et al.  PathNER: a tool for systematic identification of biological pathway mentions in the literature , 2013, BMC Systems Biology.

[12]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[13]  Enrico Motta,et al.  Semantically enhanced Information Retrieval: An ontology-based approach , 2011, J. Web Semant..

[14]  B. S. Manjunath,et al.  The iPlant Collaborative: Cyberinfrastructure for Plant Biology , 2011, Front. Plant Sci..

[15]  Volker Brendel,et al.  The BioExtract Server: a web-based bioinformatic workflow platform , 2011, Nucleic Acids Res..

[16]  Chon-Kit Kenneth Chan,et al.  Analysis of RNA-Seq Data Using TopHat and Cufflinks. , 2016, Methods in molecular biology.

[17]  Carol Lushbough,et al.  SBMLDock: Docker Driven Systems Biology Tool Development and Usage , 2015, CMSB.

[18]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[19]  Matthew R. Hanlon,et al.  Araport: the Arabidopsis Information Portal , 2014, Nucleic Acids Res..

[20]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[21]  Maristella Agosti Research and Advanced Technology for Digital Libraries, 13th European Conference, ECDL 2009, Corfu, Greece, September 27 - October 2, 2009. Proceedings , 2009, ECDL.

[22]  Gen-Tao Chiang,et al.  Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute , 2011, BMC Bioinformatics.

[23]  Vincent J. Henry,et al.  OMICtools: an informative directory for multi-omic data analysis , 2014, Database J. Biol. Databases Curation.

[24]  et al.,et al.  NCBO Technology: Powering semantically aware applications , 2013, Journal of Biomedical Semantics.

[25]  Michelle D. Brazas,et al.  A decade of web server updates at the bioinformatics links directory: 2003–2012 , 2012, Nucleic Acids Res..

[26]  Brent S. Pedersen,et al.  BioStar: An Online Question & Answer Resource for the Bioinformatics Community , 2011, PLoS Comput. Biol..

[27]  Damian Smedley,et al.  Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome , 2014, Science Translational Medicine.

[28]  Sheng-Yuan Yang An Ontology-Supported and Fully-Automatic Annotation Technology for Semantic Portals , 2007, IEA/AIE.

[29]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[30]  Robert Stevens,et al.  The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation , 2014, Journal of Biomedical Semantics.

[31]  Masao Nagasaki,et al.  XiP: a computational environment to create, extend and share workflows , 2013, Bioinform..

[32]  Rion Dooley,et al.  Software-as-a-Service: The iPlant Foundation API , 2012 .

[33]  Rion Dooley,et al.  Life science data analysis workflow development using the bioextract server leveraging the iPlant collaborative cyberinfrastructure , 2015, Concurr. Comput. Pract. Exp..

[34]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[35]  D. Stanzione,et al.  Your Data , Your Way The iPlant Foundation API Data Services , 2012 .

[36]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[37]  Reagan Moore,et al.  iRODS Primer: Integrated Rule-Oriented Data System , 2010, iRODS Primer.

[38]  Nicolas Le Novère,et al.  Identifiers.org and MIRIAM Registry: community resources to provide persistent identification , 2011, Nucleic Acids Res..