Exploiting Language Models to Classify Events from Twitter

Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure their similarities with learning language models such as ConceptNet and a latent Dirichlet allocation method for selectional preferences (LDA-SP), which have been widely studied based on large text corpora within computational linguistic relations. The relationship of term words in tweets will be discovered by checking them under each model. We then proposed a method to compute the similarity between tweets based on tweets' features including common term words and relationships among their distinguishing term words. It will be explicit and convenient for applying to k-nearest neighbor techniques for classification. We carefully applied experiments on the Edinburgh Twitter Corpus to show that our method achieves competitive results for classifying events.

[1]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[2]  Roberto Basili,et al.  Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims , 2003, Comput. Linguistics.

[3]  Ana-Maria Popescu,et al.  Extracting events and event descriptions from Twitter , 2011, WWW.

[4]  Thorsten Joachims,et al.  Identifying Temporal Patterns and Key Players in Document Collections , 1995 .

[5]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[6]  Ravi Kumar,et al.  Object matching in tweets with spatial models , 2012, WSDM '12.

[7]  Matthew Hurst,et al.  BlogPulse: Automated Trend Discovery for Weblogs , 2003 .

[8]  Lihua Huang,et al.  Psychology and behavior mechanism of micro-blog information spreading , 2012 .

[9]  J. Kalita,et al.  Automatic Summarization of Twitter Topics , 2010 .

[10]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[11]  Susumu Horiguchi,et al.  A Hidden Topic-Based Framework toward Building Applications with Short Web Documents , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[13]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[14]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[15]  Xiaohua Hu,et al.  Analysis of Browsing Behaviors with Ant Colony Clustering Algorithm , 2012, J. Comput..

[16]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[17]  Regina Barzilay,et al.  Event Discovery in Social Media Feeds , 2011, ACL.

[18]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[19]  Graham Neubig,et al.  Safety Information Mining — What can NLP do in a disaster— , 2011, IJCNLP.

[20]  Duc-Thuan Vo,et al.  Learning to classify short text from scientific documents using topic models with various types of knowledge , 2015, Expert Syst. Appl..

[21]  Weihui Dai,et al.  Emergency Event: Internet Spread, Psychological Impacts and Emergency Management , 2011, J. Comput..

[22]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Xijin Tang,et al.  Text classification based on multi-word with support vector machine , 2008, Knowl. Based Syst..

[24]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[25]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[26]  Haixun Wang,et al.  Tracking and Connecting Topics via Incremental Hierarchical Dirichlet Processes , 2011, 2011 IEEE 11th International Conference on Data Mining.

[27]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[28]  Catherine Havasi,et al.  ConceptNet 5: A Large Semantic Network for Relational Knowledge , 2013, The People's Web Meets NLP.

[29]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[30]  Hiroya Takamura,et al.  Summarizing a Document Stream , 2011, ECIR.

[31]  Sanda M. Harabagiu,et al.  Relevance Modeling for Microblog Summarization , 2011, ICWSM.

[32]  Marc Cheong,et al.  Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base , 2009, CIKM-SWSM.

[33]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[34]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[35]  Weihui Dai,et al.  Information Spread of Emergency Events: Path Searching on Social Networks , 2014, TheScientificWorldJournal.

[36]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[37]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[38]  Kirill Kireyev Applications of Topics Models to Analysis of Disaster-Related Twitter Data , 2009 .

[39]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[40]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[41]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[42]  Duc-Thuan Vo,et al.  Extraction of Semantic Relation Based on Feature Vector from Wikipedia , 2012, PRICAI.

[43]  Miles Osborne,et al.  The Edinburgh Twitter Corpus , 2010, HLT-NAACL 2010.