search GenBank: interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information

BackgroundDue to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing.ResultsWe have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases.Conclusionssearch GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.

[1]  James W. Fickett,et al.  The GenBank genetic sequence databank , 1986, Nucleic Acids Res..

[2]  D. Lipman,et al.  National Center for Biotechnology Information , 2019, Springer Reference Medizin.

[3]  S H Bryant,et al.  A dynamic look at structures: WWW-Entrez and the Molecular Modeling Database. , 1996, Trends in biochemical sciences.

[4]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.

[5]  Tatiana A. Tatusova,et al.  Complete genomes in WWW Entrez: data representation and analysis , 1999, Bioinform..

[6]  Yanli Wang,et al.  MMDB: Entrez's 3D-structure database , 2003, Nucleic Acids Res..

[7]  Francisco Curbera,et al.  Web services description language (wsdl) version 1. 2 , 2001 .

[8]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[9]  J McEntyre,et al.  PubMed: bridging the information gap. , 2001, CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne.

[10]  John B. Anderson,et al.  MMDB: Entrez's 3D-structure database , 2002, Nucleic Acids Res..

[11]  C. Peltz,et al.  Web Services Orchestration and Choreography , 2003, Computer.

[12]  Jean Jacques Moreau,et al.  SOAP Version 1. 2 Part 1: Messaging Framework , 2003 .

[13]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[14]  David Wheeler,et al.  Building Customized Data Pipelines Using the Entrez Programming Utilities (eUtils) , 2004 .

[15]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[16]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[17]  J. Skupień,et al.  Molecular background and clinical characteristics of HNF1A MODY in a Polish population. , 2008, Diabetes & metabolism.

[18]  David Stuart Robertson,et al.  Choreographing Web Services , 2009, IEEE Transactions on Services Computing.

[19]  Kirk Haselden,et al.  Microsoft SQL Server 2008 Integration Services Unleashed , 2009 .

[20]  Serge Mankovskii,et al.  Service Oriented Architecture , 2009, Encyclopedia of Database Systems.

[21]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[22]  Mark D. Wilkinson,et al.  SADI Semantic Web Services - ‚cause you can't always GET what you want! , 2009, 2009 IEEE Asia-Pacific Services Computing Conference (APSCC).

[23]  E. Sayers A General Introduction to the E-utilities , 2010 .

[24]  Barrie Sosinsky,et al.  Cloud Computing Bible , 2010 .

[25]  I. Melzer Web Services Description Language , 2010 .

[26]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[27]  田中 俊典 National Center for Biotechnology Information (NCBI) , 2012 .

[28]  Kathi Canese,et al.  PubMed: The Bibliographic Database , 2013 .

[29]  James Ostell The Entrez Search and Retrieval System , 2014 .