Discovering author interest evolution in order-sensitive and Semantic-aware topic modeling

Abstract Modeling the interests of authors over time from documents has important applications in broad applications such as recommendation systems, authorship identification and opinion extraction. In this paper, we propose an Ordering-sensitive and Semantic-aware Dynamic Author Topic Model (OSDATM), which monitors the evolution of author interest in time-stamped documents. The model further uses the discovered author interest information to discover better topics. Unlike traditional topic models, OSDATM is sensitive to the ordering of words, thus it extracts more information from the semantic meaning of the context. The experimental results show that OSDATM learns better topics than state-of-the-art topic models. In addition, the dynamic interests of authors that the OSDATM model discovers are interpretable and consistent with the truth.

[1]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[4]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[5]  Stefan M. Rüger,et al.  Weakly Supervised Joint Sentiment-Topic Detection from Text , 2012, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Enrico Motta,et al.  Klink-2: Integrating Multiple Web Sources to Generate Semantic Topic Networks , 2015, SEMWEB.

[9]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[10]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[11]  Lijun Zhu,et al.  Author-Topic over Time (AToT): A Dynamic Users' Interest Model , 2013, MUSIC.

[12]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[13]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[14]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[15]  Noriaki Kawamae,et al.  Author interest topic model , 2010, SIGIR.

[16]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[17]  Jie Tang,et al.  Modeling the evolution of associated data , 2010, Data Knowl. Eng..

[18]  Min Yang,et al.  Discovering Author Interest Evolution in Topic Modeling , 2016, SIGIR.

[19]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[20]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[21]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[22]  Min Yang,et al.  Ordering-Sensitive and Semantic-Aware Topic Modeling , 2015, AAAI.

[23]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[24]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[25]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[26]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[27]  Ali Daud,et al.  Using time topic modeling for semantics-based dynamic research interest finding , 2012, Knowl. Based Syst..

[28]  Rob Koopman,et al.  Clustering articles based on semantic similarity , 2017, Scientometrics.

[29]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[30]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[31]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[32]  Kevin W. Boyack,et al.  Comparison of topic extraction approaches and their results , 2017, Scientometrics.

[33]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[34]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[35]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[37]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[38]  Andrew McCallum,et al.  Group and topic discovery from relations and text , 2005, LinkKDD '05.

[39]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[40]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[41]  Yu Wu,et al.  Incorporating Metadata into Dynamic Topic Analysis , 2012, BMA.