Biosphere: the interoperation of web services in microarray cluster analysis.

UNLABELLED The growing use of DNA microarrays in biomedical research has led to the proliferation of analysis tools. These software programs address different aspects of analysis (e.g. normalisation and clustering within and across individual arrays) as well as extended analysis methods (e.g. clustering, annotation and mining of multiple datasets). Therefore, microarray data analysis typically requires the interoperability of multiple software programs involving different analysis types and methods. Such interoperation is often hampered by the heterogeneity inherent in the software tools (which may function by implementing different interfaces and using different programming languages). To address this problem, we employed the simple object access protocol (SOAP)-based web service approach that provides a uniform programmatic interface to these heterogeneous software components. To demonstrate this approach in the microarray context, we created a web server application, Biosphere, which interoperates a number of web services that are geographically widely distributed. These web services include a clustering web service, which is a suite of different clustering algorithms for analysing microarray data; XEMBL, developed at the European Bioinformatics Institute (EBI) for retrieving EMBL Nucleotide Sequence Database sequence data; and three gene annotation web services: GetGO, GetHAPI and GetUMLS. GetGO allows retrieval of Gene Ontology (GO) annotation, and the other two web services retrieve annotation from the biomedical literature that is indexed based on the Medical Subject Headings (MeSH) terms. With these web services, Biosphere allows the users to do the following: (i) cluster gene expression data using seven different algorithms; (ii) visualise the clustering results that are grouped statistically in colour; and (iii) retrieve sequence, annotation and citation data for the genes of interest. AVAILABILITY Biosphere and its web services described in Web Service Description Language (WSDL) can be accessed at http://rook.cecid.hku.hk:8280/BiosphereServer.

[1]  John Quackenbush,et al.  Open source software for the analysis of microarray data. , 2003, BioTechniques.

[2]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[3]  Alan J. Robinson,et al.  XEMBL: distributing EMBL data in XML format , 2002, Bioinform..

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Kei-Hoi Cheung,et al.  Identifying projected clusters from gene expression profiles , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[6]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[7]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..