Entity-based Classification of Twitter Messages

Twitter is a popular micro-blogging service on theWeb, where people can enter short messages, which then become visible to some other users of the service. While the topics of these messages varies, there are a lot of messages where the users express their opinions about some companies or their products. These messages are a rich source of information for companies for sentiment analysis or opinion mining. There is however a great obstacle for analyzing the messages directly: as the company names are often ambiguous (e.g. apple, the fruit vs. Apple Inc.), one needs first to identify, which messages are related to the company. In this paper we address this question. We present various techniques for classifying tweet messages containing a given keyword, whether they are related to a particular company with that name or not. We first present simple techniques, which make use of company profiles, which we created semi-automatically from external Web sources. Our advanced techniques take ambiguity estimations into account and also automatically extend the company profiles from the twitter stream itself. We demonstrate the effectiveness of our methods through an extensive set of experiments. Moreover, we extensively analyze the sources of errors in the classification. The analysis not only brings further improvement, but also enables to use the human input more efficiently.

[1]  Gerhard Weikum,et al.  Gathering and ranking photos of named entities with high precision, high recall, and diversity , 2010, WSDM '10.

[2]  Sang-Bum Kim,et al.  Effective Methods for Improving Naive Bayes Text Classifiers , 2002, PRICAI.

[3]  Paul Kalmar Bootstrapping Websites for Classification of Organization Names on Twitter , 2010, CLEF.

[4]  Miguel Ángel García Cumbreras,et al.  SINAI at WePS-3: Online Reputation Management , 2010, CLEF.

[5]  Paolo Rosso,et al.  On the Difficulty of Clustering Microblog Texts for Online Reputation Management , 2011, WASSA@ACL.

[6]  Katja Hofmann,et al.  The University of Amsterdam at WePS2 , 2009 .

[7]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[8]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[9]  Julio Gonzalo,et al.  WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task , 2010, CLEF.

[10]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[11]  Peter Fankhauser,et al.  From Web Data to Entities and Back , 2010, CAiSE.

[12]  Brian D. Davison,et al.  A Bootstrapping Approach to Identifying Relevant Tweets for Social TV , 2011, ICWSM.

[13]  Karl Aberer,et al.  What have fruits to do with technology?: the case of Orange, Blackberry and Apple , 2011, WIMS '11.

[14]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[15]  Karl Aberer,et al.  It Was Easy, when Apples and Blackberries Were only Fruits , 2010, CLEF.

[16]  Jihoon Yang,et al.  Ensembles of Region Based Classifiers , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[17]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[18]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[19]  Xiaoqing Ding,et al.  Incorporating Generic Learning to Design Discriminative Classifier Adaptable for Unknown Subject in Face Verification , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[20]  Karl Aberer,et al.  Towards better entity resolution techniques for Web document collections , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[21]  Karl Aberer,et al.  Quality-aware similarity assessment for entity matching in Web data , 2012, Inf. Syst..

[22]  Hiroshi Nakagawa,et al.  ITC-UT: Tweet Categorization by Query Categorization for On-line Reputation Management , 2010, CLEF.

[23]  Dmitri V. Kalashnikov,et al.  Web People Search via Connection Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[24]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[25]  Krisztian Balog,et al.  The University of Amsterdam at WePS3 , 2010, CLEF.

[26]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[27]  Dmitri V. Kalashnikov,et al.  Exploiting context analysis for combining multiple entity resolution systems , 2009, SIGMOD Conference.

[28]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.