An improved Latent Dirichlet Allocation method for service topic detection

Service topic detection is one of the most important techniques in service information extraction, clustering and recommendation. Comparing with short text corpus in social network, service description corpus possesses higher dimensionality and more diversity. It is difficult to detect topics from a large number of service descriptions. To address these challenges, we proposed a new LDA (Latent Dirichlet Allocation) model based topic detection method, referred to as CV-LDA (Context sensitive word Vector based LDA). It utilizes a word embedding based method that generate context sensitive vector to cluster the words for decreasing dimensionality. Through topic perplexity analysis in the real-world dataset, it is obvious that topics detected by our method has a lower perplexity, comparing with word frequency weighing based vectors.

[1]  Zibin Zheng,et al.  WT-LDA: User Tagging Augmented LDA for Web Service Clustering , 2013, ICSOC.

[2]  Xinyu Dai,et al.  Topic2Vec: Learning distributed representations of topics , 2015, 2015 International Conference on Asian Language Processing (IALP).

[3]  Vlad Trifa,et al.  Interacting with the SOA-Based Internet of Things: Discovery, Query, Selection, and On-Demand Provisioning of Web Services , 2010, IEEE Transactions on Services Computing.

[4]  Masaaki Nagata,et al.  A Unified Learning Framework of Skip-Grams and Global Vectors , 2015, ACL.

[5]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[6]  Fei Song,et al.  Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA , 2013, ArXiv.

[7]  Le Yu,et al.  Collapsed Gibbs sampling for latent Dirichlet allocation on spark , 2014, Big Data 2014.

[8]  Yang Gao,et al.  Streaming Gibbs Sampling for LDA Model , 2016, ArXiv.

[9]  Zhe Li,et al.  Cloud Service Recommendation: State of the Art and Research Challenges , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[10]  Xuanjing Huang,et al.  Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model , 2015, IJCAI.

[11]  Mohamed Quafafou,et al.  Probabilistic Topic Models for Web Services Clustering and Discovery , 2013, ESOCC.

[12]  Wilson Wong,et al.  Web service clustering using text mining techniques , 2009, Int. J. Agent Oriented Softw. Eng..

[13]  Bo Huang,et al.  Microblog Topic Detection Based on LDA Model and Single-Pass Clustering , 2012, RSCTC.