WE-LDA: A Word Embeddings Augmented LDA Model for Web Services Clustering

Due to the rapid growth in both the number and diversity of Web services on the web, it becomes increasingly difficult for us to find the desired and appropriate Web services nowadays. Clustering Web services according to their functionalities becomes an efficient way to facilitate the Web services discovery as well as the services management. Existing methods for Web services clustering mostly focus on utilizing directly key features from WSDL documents, e.g., input/output parameters and keywords from description text. Probabilistic topic model Latent Dirichlet Allocation (LDA) is also adopted, which extracts latent topic features of WSDL documents to represent Web services, to improve the accuracy of Web services clustering. However, the power of the basic LDA model for clustering is limited to some extent. Some auxiliary features can be exploited to enhance the ability of LDA. Since the word vectors obtained by Word2vec is with higher quality than those obtained by LDA model, we propose, in this paper, an augmented LDA model (named WE-LDA) which leverages the high-quality word vectors to improve the performance of Web services clustering. In WE-LDA, the word vectors obtained by Word2vec are clustered into word clusters by K-means++ algorithm and these word clusters are incorporated to semi-supervise the LDA training process, which can elicit better distributed representations of Web services. A comprehensive experiment is conducted to validate the performance of the proposed method based on a ground truth dataset crawled from ProgrammableWeb. Compared with the state-of-the-art, our approach has an average improvement of 5.3% of the clustering accuracy with various metrics.

[1]  Meng Wang,et al.  A QoS-Aware Web Service Selection Algorithm Based on Clustering , 2011, 2011 IEEE International Conference on Web Services.

[2]  Serge Mankovskii,et al.  Service Oriented Architecture , 2009, Encyclopedia of Database Systems.

[3]  Mingdong Tang,et al.  Mashup Service Clustering Based on an Integration of Service Content and Network via Exploiting a Two-Level Topic Model , 2016, 2016 IEEE International Conference on Web Services (ICWS).

[4]  Mingdong Tang,et al.  AWSR: Active Web Service Recommendation Based on Usage History , 2012, 2012 IEEE 19th International Conference on Web Services.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Mingdong Tang,et al.  A Probabilistic Topic Model for Mashup Tag Recommendation , 2016, 2016 IEEE International Conference on Web Services (ICWS).

[7]  Cheng Wu,et al.  Category-Aware API Clustering and Distributed Recommendation for Automatic Mashup Creation , 2015, IEEE Transactions on Services Computing.

[8]  Zibin Zheng,et al.  Collaborative Web Service QoS Prediction via Neighborhood Integrated Matrix Factorization , 2013, IEEE Transactions on Services Computing.

[9]  Zhiliang Zhu,et al.  WS-SCAN: A effective approach for web services clustering , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[10]  Eyhab Al-Masri,et al.  Investigating web services on the world wide web , 2008, WWW.

[11]  Richi Nayak,et al.  Web Service Discovery with additional Semantics and Clustering , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[12]  Zibin Zheng,et al.  WT-LDA: User Tagging Augmented LDA for Web Service Clustering , 2013, ICSOC.

[13]  Ling-li Xie,et al.  Ontology-based semantic web services clustering , 2011, 2011 IEEE 18th International Conference on Industrial Engineering and Engineering Management.

[14]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[15]  Roberto Chinnici,et al.  Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language , 2007 .

[16]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[17]  Zibin Zheng,et al.  A Clustering-Based QoS Prediction Approach for Web Service Recommendation , 2012, 2012 IEEE 15th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Walid Gaaloul,et al.  Data Providing Services Clustering and Management for Facilitating Service Discovery and Replacement , 2013, IEEE Transactions on Automation Science and Engineering.

[20]  Meng Zhang,et al.  A Web Service Recommendation Approach Based on QoS Prediction Using Fuzzy Clustering , 2012, 2012 IEEE Ninth International Conference on Services Computing.

[21]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[22]  Yingqiu Li,et al.  Research on Web service discovery with semantics and clustering , 2011, 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference.

[23]  Zibin Zheng,et al.  WTCluster: Utilizing Tags for Web Services Clustering , 2011, ICSOC.

[24]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[25]  Jafreezal Jaafar,et al.  Fuzzy-based Clustering of Web Services' Quality of Service: A Review , 2014, J. Commun..

[26]  Patrick Martin,et al.  Clustering WSDL Documents to Bootstrap the Discovery of Web Services , 2010, 2010 IEEE International Conference on Web Services.

[27]  Richi Nayak,et al.  Web Service Discovery with additional Semantics and Clustering , 2007 .

[28]  Liang Chen,et al.  Learning Sparse Functional Factors for Large-Scale Service Clustering , 2015, 2015 IEEE International Conference on Web Services.

[29]  J. Loganathan,et al.  Clustering web services based on multi-criteria service dominance relationship using Peano Space filling curve , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[30]  Gregor Heinrich Parameter estimation for text analysis , 2009 .