Activity-based topic discovery

A topic model capable of assigning word pairs to associated topics is developed to explore people's activities. Considering that the form of word pairs led by verbs is a more effective way to express people's activities than separate words, we incorporate the word-connection model into the smoothed Latent Dirichlet Allocation LDA to ensure that the words are well paired and assigned to the associated topics. To quantitatively and qualitatively evaluate the proposed model, two datasets were built using Twitter posts as data sources: the wish-related and the geographical information-related datasets. The experiment results using the wish-related dataset indicate that the relatedness of words plays a key role in forming reasonable pairs, and the proposed model, word-pair generative Latent Dirichlet Allocation wpLDA, performs well in clustering. Results obtained using the geographical information-related dataset demonstrate that the proposed model works well for discovering people's activities, in which the activities are understandably represented with an intuitive character.

[1]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[3]  Shoji Kurakake,et al.  Task Knowledge Based Retrieval for Service Relevant to Mobile User's Activity , 2005, SEMWEB.

[4]  Alexander J. Smola,et al.  Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text , 2011, AISTATS.

[5]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[6]  T. Raju William Sealy Gosset and William A. Silverman: Two “Students” of Science , 2005, Pediatrics.

[7]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[8]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[9]  Jun Ota,et al.  Automatic task-based profile representation for content-based recommendation , 2012, Int. J. Knowl. Based Intell. Eng. Syst..

[10]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[11]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[12]  Joan Fisher Box,et al.  Guinness, Gosset, Fisher, and Small Samples , 1987 .

[13]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[15]  Yanchun Zhang,et al.  Modeling user hidden navigational behavior for Web recommendation , 2011, Web Intell. Agent Syst..

[16]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[17]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.

[18]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  SmadjaFrank Retrieving collocations from text , 1993 .

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[22]  Weiming Hu,et al.  Topic Detection for Discussion Threads with Domain Knowledge , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[23]  Wei-Ying Ma,et al.  TSSP: Multi-features based reinforcement algorithm to find related papers , 2006, Web Intell. Agent Syst..

[24]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[25]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[26]  Diego Reforgiato Recupero,et al.  AVA: Adjective-Verb-Adverb Combinations for Sentiment Analysis , 2008, IEEE Intelligent Systems.

[27]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[28]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[29]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[30]  Tyzoon T. Tyebjee,et al.  A Model of Venture Capitalist Investment Activity , 1984 .

[31]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[32]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[33]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[34]  Bernt Schiele,et al.  Discovery of activity patterns using topic models , 2008 .

[35]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.