Probabilistic Topic Models for Web Services Clustering and Discovery

In Information Retrieval the Probabilistic Topic Models were originally developed and utilized for topic extraction and document modeling. In this paper, we explore several probabilistic topic models: Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA) and Correlated Topic Model (CTM) to extract latent factors from web service descriptions. These extracted latent factors are then used to group the services into clusters. In our approach, topic models are used as efficient dimension reduction techniques, which are able to capture semantic relationships between word-topic and topic-service interpreted in terms of probability distributions. To address the limitation of keywords-based queries, we represent web service description as a vector space and we introduce a new approach for discovering web services using latent factors. In our experiment, we compared the accuracy of the three probabilistic clustering algorithms (PLSA, LDA and CTM) with that of a classical clustering algorithm. We evaluated also our service discovery approach by calculating the precision (P@n) and normalized discounted cumulative gain (NDCGn). The results show that both approaches based on CTM and LDA perform better than other search methods.

[1]  Nicholas Kushmerick,et al.  Learning to Attach Semantic Metadata to Web Services , 2003, International Semantic Web Conference.

[2]  Richi Nayak,et al.  Ontology Mining for Personalized Web Information Gathering , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[3]  Colin Atkinson,et al.  A Practical Approach to Web Service Discovery and Retrieval , 2007, IEEE International Conference on Web Services (ICWS 2007).

[4]  John Mylopoulos,et al.  The Semantic Web - ISWC 2003 , 2003, Lecture Notes in Computer Science.

[5]  Klaus Moessner,et al.  A Probabilistic Latent Factor approach to service ranking , 2011, 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing.

[6]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[9]  Patrick Martin,et al.  Clustering WSDL Documents to Bootstrap the Discovery of Web Services , 2010, 2010 IEEE International Conference on Web Services.

[10]  Jun Zhang,et al.  Simlarity Search for Web Services , 2004, VLDB.

[11]  Yanchun Zhang,et al.  Efficiently finding web services using a clustering semantic approach , 2008, CSSSIA '08.

[12]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[13]  Richi Nayak,et al.  Web Service Discovery with additional Semantics and Clustering , 2007 .

[14]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[15]  Schahram Dustdar,et al.  Web service clustering using multidimensional angles as proximity measures , 2009, TOIT.

[16]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[17]  Klaus Moessner,et al.  Probabilistic Methods for Service Clustering , 2010, SMRR@ISWC.

[18]  Witold Abramowicz,et al.  Architecture for Web Services Filtering and Clustering , 2007, Second International Conference on Internet and Web Applications and Services (ICIW'07).

[19]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[20]  Wilson Wong,et al.  Web service clustering using text mining techniques , 2009, Int. J. Agent Oriented Softw. Eng..

[21]  Natallia Kokash,et al.  A Comparison of Web Service Interface Similarity Measures , 2006, STAIRS.

[22]  Qi Yu,et al.  Place Semantics into Context: Service Community Discovery from the WSDL Corpus , 2011, ICSOC.

[23]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[24]  Amit P. Sheth,et al.  Adding Semantics to Web Services Standards , 2003, ICWS.

[25]  M. McPherson,et al.  PCR 2 : a practical approach , 2016 .