The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*

Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.

Akira R. Kinjo | Kiyoko F. Aoki-Kinoshita | Kiyoshi Asai | Christian M. Zmasek | Yoshinobu Kano | Matthew R. Pocock | Hideaki Sugawara | Yasunori Yamamoto | Atsuko Yamaguchi | Hiroyuki Nakamura | Toshihisa Takagi | Yasukazu Nakamura | Tamotsu Noguchi | René Ranzinger | Mark D. Wilkinson | Michael Kuhn | Pjotr Prins | Keiichiro Ono | Thomas M. Oinn | Shuichi Kawashima | Hong-Woo Chun | Paul M. K. Gordon | William S. York | Rutger A. Vos | Toshiaki Katayama | Daron M. Standley | Florian Reisinger | Kazuharu Arakawa | Richard M. Bruskiewich | Oswaldo Trelles | Raoul Jean Pierre Bonnal | Eri Kibukawa | Chikashi Nobata | Evangelos Pafilis | José María Fernández | Stuart Owen | Akira Funahashi | Arnaud Kerhornou | Shinobu Okamoto | Mark J. Schreiber | Jan Aerts | Bruno Aranda | Tatsuya Nishizawa | Lukasz Salwínski | Hilmar Lapp | Yasumasa Shigemoto | Naohisa Goto | Jan Christian Bryne | Edward A. Kawas | Mitsuteru Nakao | Andreas Groscurth | Martin Senger | Richard C. G. Holland | Toshiyuki Tashiro | Alex Gutteridge | Lord H. Barboza | Heikki Lehväslaiho | J. Aerts | Yasunori Yamamoto | T. Takagi | H. Sugawara | R. Bruskiewich | Alex Gutteridge | L. Salwínski | K. Ono | T. Oinn | K. Asai | M. Senger | Michael Kuhn | Chikashi Nobata | Toshiyuki Tashiro | H. Lapp | R. Vos | A. Funahashi | H. Lehväslaiho | Bruno Aranda | A. Kinjo | D. Standley | Yasukazu Nakamura | Toshiaki Katayama | S. Kawashima | Eri Kibukawa | K. Arakawa | P. Gordon | C. Zmasek | E. Pafilis | F. Reisinger | P. Prins | Naohisa Goto | Yoshinobu Kano | M. Nakao | A. Kerhornou | O. Trelles | H. Chun | J. M. Fernández | W. York | R. Bonnal | S. Okamoto | Atsuko Yamaguchi | T. Noguchi | Tatsuya Nishizawa | Hiroyuki Nakamura | Yasumasa Shigemoto | R. Ranzinger | J. Bryne | S. Owen | Andreas Groscurth

[1]  Patrick Lambrix,et al.  Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX , 2005, Bioinform..

[2]  Akira R. Kinjo,et al.  Protein structure databases with new web services for structural biology and biomedical research , 2008, Briefings Bioinform..

[3]  Lawrence Tagg Services , 1987 .

[4]  Toshihisa Takagi,et al.  TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services , 2010, Nucleic Acids Res..

[5]  Christoph W. Sensen,et al.  Creating Bioinformatics Semantic Web Services from Existing Web Services: A Real-World Application of SAWSDL , 2008, 2008 IEEE International Conference on Web Services.

[6]  Sean Martin,et al.  Globally distributed object identification for biological knowledgebases , 2004, Briefings Bioinform..

[7]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[8]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[9]  Christian M. Zmasek,et al.  phyloXML: XML for evolutionary biology and comparative genomics , 2009, BMC Bioinformatics.

[10]  D. Maddison,et al.  The Tree of Life Web Project , 2007 .

[11]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[12]  Hideaki Sugawara,et al.  DDBJ with new system and face , 2007, Nucleic Acids Res..

[13]  Robert D. Finn,et al.  Experience using web services for biological sequence analysis , 2008, Briefings Bioinform..

[14]  Rolf Apweiler,et al.  The EBI SRS Server: Recent Developments , 2002, German Conference on Bioinformatics.

[15]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[16]  Masaru Tomita,et al.  G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining , 2003, Bioinform..

[17]  Hilmar Lapp,et al.  Open source tools and toolkits for bioinformatics: significance, and where are we? , 2006, Briefings Bioinform..

[18]  Ian T. Foster,et al.  Modeling and Managing State in Distributed Systems: The Role of OGSI and WSRF , 2005, Proceedings of the IEEE.

[19]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[20]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[21]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[22]  Pjotr Prins,et al.  BioRuby: bioinformatics software for the Ruby programming language , 2010, Bioinform..

[23]  Joel Dudley,et al.  TimeTree: a public knowledge-base of divergence times among organisms , 2006, Bioinform..

[24]  Alfonso Valencia,et al.  Interoperability with Moby 1.0--it's better than sharing your toothbrush! , 2008, Briefings in bioinformatics.

[25]  Andreas Prlic,et al.  Sequence analysis , 2003 .

[26]  Hilmar Lapp,et al.  The 2006 NESCent Phyloinformatics Hackathon: A Field Report , 2007, Evolutionary Bioinformatics Online.

[27]  N. Kikuchi,et al.  CellDesigner 3.5: A Versatile Modeling Tool for Biochemical Networks , 2008, Proceedings of the IEEE.

[28]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[29]  C. W. von der Lieth,et al.  LINUCS: linear notation for unique description of carbohydrate sequences. , 2001, Carbohydrate research.

[30]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[31]  K Bretonnel Cohen,et al.  Journal of Biomedical Discovery and Collaboration Open Access an Open-source Framework for Large-scale, Flexible Evaluation of Biomedical Text Mining Systems , 2008 .

[32]  Hideaki Sugawara,et al.  DBJ in the stream of various biological data , 2004, Nucleic Acids Res..

[33]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[34]  I Marchal,et al.  Bioinformatics in glycobiology. , 2003, Biochimie.

[35]  Patrick Lambrix,et al.  Representing, storing and accessing molecular interaction data: a review of models and tools , 2006, Briefings Bioinform..

[36]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[37]  K. Bretonnel Cohen,et al.  U-Compare: share and compare text mining tools with UIMA , 2009, Bioinform..

[38]  José Francisco Aldana Montes,et al.  Intelligent client for integrating bioinformatics services , 2006, Bioinform..

[39]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[40]  Rodrigo Lopez,et al.  Web Services at the European Bioinformatics Institute , 2007, Nucleic Acids Res..

[41]  Hideaki Sugawara,et al.  DDBJ dealing with mass data produced by the second generation sequencer , 2008, Nucleic Acids Res..

[42]  Christoph W. Sensen,et al.  Seahawk: moving beyond HTML in Web-based bioinformatics analysis , 2007, BMC Bioinformatics.

[43]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[44]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[45]  Mark D. Wilkinson,et al.  BioMOBY: An Open Source Biological Web Services Proposal , 2002, Briefings Bioinform..

[46]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[47]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[48]  Alfonso Valencia,et al.  iHOP web services , 2007, Nucleic Acids Res..

[49]  Tatsuya Akutsu,et al.  KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains , 2004, Nucleic Acids Res..

[50]  Chris Sander,et al.  Introducing meta-services for biomedical information extraction , 2008, Genome Biology.

[51]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[52]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[53]  M. Kanehisa,et al.  DBGET/LinkDB: an integrated database retrieval system. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.