Knowledge Extraction from Web Services Repositories

With the increasing use of web and Service Oriented Systems, web-services have become a widely adopted technology. Web services repositories are growing fast, creating the need for advanced tools for organizing and indexing them. Clustering web services, usually represented by Web Service Description Language (WSDL) documents, enables the web service search engines and users to organize and process large web service repositories in groups with similar functionality and characteristics. In this paper, we propose a novel technique of clustering WSDL documents. The proposed method considers web services as categorical data and each service is described by a set of values extracted from the content and structure of its description file and as quality measure of clustering is defined the mutual information of the clusters and their values. We describe the way to represent web services as categorical data and how to cluster them by using LIMBO algorithm, minimizing at the same time the information loss in features values. In experimental evaluation, our approach outperforms in terms of F-Measure the approaches which use alternative similarity measures and methods for clustering WSDL documents.

[1]  Renée J. Miller,et al.  LIMBO: Scalable Clustering of Categorical Data , 2004, EDBT.

[2]  Periklis Andritsos,et al.  Evaluating Value Weighting Schemes in the Clustering of Categorical Data , 2006 .

[3]  Wilson Wong,et al.  Web service clustering using text mining techniques , 2009, Int. J. Agent Oriented Softw. Eng..

[4]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[6]  Schahram Dustdar,et al.  Web service clustering using multidimensional angles as proximity measures , 2009, TOIT.

[7]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Patrick Martin,et al.  Clustering WSDL Documents to Bootstrap the Discovery of Web Services , 2010, 2010 IEEE International Conference on Web Services.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[12]  Jun Zhang,et al.  Simlarity Search for Web Services , 2004, VLDB.

[13]  Ronald V. Book,et al.  Review: Michael R. Garey and David S. Johnson, Computers and intractability: A guide to the theory of $NP$-completeness , 1980 .

[14]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[15]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.