Estimation of user's activity from tweets through tri-layer clustering model

We propose a topic model to better estimate activities from tweets. The whole estimation process consists of two phases: one is the cluster generation, and the other is the activity estimation. At the first phase, we obtain the expected trilayer clusters with the components: a topic layer, an activity layer and a word layer. Then, at the second phase, we utilize the activity-specific word distribution derived from the training results to learn the activities of testing tweets. To prove the feasibility of this model, we evaluate the precision of activity estimation using 35 activities to extract 23,988 tweets for training and 350 for testing, respectively. The experimental results demonstrate that the reasonable topic-specific activity distribution contributes to the cluster generation, and the proposed model exhibits the superiority in activity estimation.

[1]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[2]  Shoji Kurakake,et al.  Task Knowledge Based Retrieval for Service Relevant to Mobile User's Activity , 2005, SEMWEB.

[3]  Frank D. Wood,et al.  Hierarchically Supervised Latent Dirichlet Allocation , 2011, NIPS.

[4]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Jun Ota,et al.  Intuitive Topic Discovery by Incorporating Word-Pair's Connection Into LDA , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[6]  Kathleen McKeown,et al.  A Hierarchical Model of Web Summaries , 2011, ACL.

[7]  Jun Ota,et al.  User-centered profile representation for recommendations across multiple content domains , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[8]  Ahmed Hassan Awadallah,et al.  Beyond DCG: user behavior as a predictor of a successful search , 2010, WSDM '10.

[9]  Alexander J. Smola,et al.  Scalable distributed inference of dynamic user interests for behavioral targeting , 2011, KDD.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Tyzoon T. Tyebjee,et al.  A Model of Venture Capitalist Investment Activity , 1984 .

[12]  Philip K. Chan,et al.  Learning implicit user interest hierarchy for context in personalization , 2008, IUI '03.

[13]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[14]  John F. Canny,et al.  Large-scale behavioral targeting , 2009, KDD.

[15]  Jun Ota,et al.  Genetically optimizing query expansion for retrieving activities from the web , 2012, WIMS '12.

[16]  Dan Klein,et al.  Optimization, Maxent Models, and Conditional Estimation without Magic , 2003, NAACL.

[17]  Zhenglu Yang,et al.  Dynamic Adaptation Strategies for Long-Term and Short-Term User Profile to Personalize Search , 2007, APWeb/WAIM.

[18]  Jun Ota,et al.  Automatic modeling of user's real world activities from the web for semantic IR , 2010, SEMSEARCH '10.

[19]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[20]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Alessandro Micarelli,et al.  User Profiles for Personalized Information Access , 2007, The Adaptive Web.

[22]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[23]  Marius Pasca,et al.  Latent Variable Models of Concept-Attribute Attachment , 2009, ACL/IJCNLP.

[24]  Jun Ota,et al.  Automatic task-based profile representation for content-based recommendation , 2012, Int. J. Knowl. Based Intell. Eng. Syst..

[25]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.