ULW-DMM: An Effective Topic Modeling Method for Microblog Short Text

With the popularity of social media, including micro-blog, mining effective information in short texts has become an increasingly important issue. However, due to the sparseness, high dimensionality and large amount of data, mining this information is a very challenging task. In this paper, we propose a method to extend the Dirichlet multinomial mixture (DMM) topic model by combining the user-LDA topic model based on internal data expansion with the potential feature vector representation of words trained on a very large external corpus (we refer to it as ULW-DMM). The experimental results show that the ULW-DMM model produces a relatively large improvement in topic consistency and classification tasks for topic modeling of microblog short texts.

[1]  Dat Quoc Nguyen,et al.  Improving Topic Models with Latent Feature Word Representations , 2015, TACL.

[2]  Fei Peng,et al.  OnSeS: A Novel Online Short Text Summarization Based on BM25 and Neural Network , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[3]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Wei Xu,et al.  Can artificial neural networks learn language models? , 2000, INTERSPEECH.

[6]  Xiao Ma,et al.  EARS: Emotion-aware recommender system based on hybrid information fusion , 2019, Inf. Fusion.

[7]  Xinguang He,et al.  A hybrid wavelet neural network model with mutual information and particle swarm optimization for forecasting monthly rainfall , 2015 .

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Yin Zhang,et al.  Self-Evolving Trading Strategy Integrating Internet of Things and Big Data , 2018, IEEE Internet of Things Journal.

[10]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[11]  Bridget T. McInnes,et al.  Vector representations of multi-word terms for semantic relatedness , 2018, J. Biomed. Informatics.

[12]  Peng Wang,et al.  Self-Taught Convolutional Neural Networks for Short Text Clustering , 2017, Neural Networks.

[13]  Ashish V. Tendulkar,et al.  Comparative study of clustering techniques for short text documents , 2011, WWW.

[14]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[15]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16]  Susumu Horiguchi,et al.  A Hidden Topic-Based Framework toward Building Applications with Short Web Documents , 2011, IEEE Transactions on Knowledge and Data Engineering.

[17]  Pengfei Jiao,et al.  Similarity-based Regularized Latent Feature Model for Link Prediction in Bipartite Networks , 2017, Scientific Reports.

[18]  Yun-Fei Chen,et al.  Acupuncture Improves Peri-menopausal Insomnia: A Randomized Controlled Trial , 2017, Sleep.

[19]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[20]  Huimin Lu,et al.  PEA: Parallel electrocardiogram-based authentication for smart healthcare systems , 2018, J. Netw. Comput. Appl..

[21]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[22]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[23]  Yin Zhang,et al.  TempoRec: Temporal-Topic Based Recommender for Social Network Services , 2017, Mob. Networks Appl..

[24]  Sinno Jialin Pan,et al.  Short and Sparse Text Topic Modeling via Self-Aggregation , 2015, IJCAI.

[25]  Qiang Yang,et al.  Transferring topical knowledge from auxiliary long texts for short text clustering , 2011, CIKM '11.

[26]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[27]  Xindong Wu,et al.  Short text clustering based on Pitman-Yor process mixture model , 2018, Applied Intelligence.

[28]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[29]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[30]  Victor C. M. Leung,et al.  SOVCAN: Safety-Oriented Vehicular Controller Area Network , 2017, IEEE Communications Magazine.

[31]  Chong Wang,et al.  N-phenyl maleimide grafted MWNT/bismaleimide-allyl bisphenol A nanocomposites: Improved MWNT dispersion, resin reactivity and composite mechanical strength , 2017 .