What have fruits to do with technology?: the case of Orange, Blackberry and Apple

Twitter is a micro-blogging service on the Web, where people can enter short messages, which then become visible to other users of the service. While the topics of these messages varies, there are a lot of messages where the users express their opinions about companies or products. Since the twitter service is very popular, the messages form a rich source of information for companies. They can learn with the help of data mining and sentiment analysis techniques, how their customers like their products or what is the general perception of the company. There is however a great obstacle for analyzing the data directly: as the company names are often ambiguous, one needs first to identify, which messages are related to the company. In this paper we address this question. We present various techniques to classify tweet messages, whether they are related to a given company or not, for example, whether a message containing the keyword "apple" is about the company Apple Inc.. We present simple techniques, which make use of company profiles, which we created semi-automatically from external Web sources. Our advanced techniques take ambiguity estimations into account and also automatically extend the company profiles from the twitter stream itself. We demonstrate the effectiveness of our methods through an extensive set of experiments.

[1]  Karl Aberer,et al.  Towards better entity resolution techniques for Web document collections , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[2]  Gerhard Weikum,et al.  Gathering and ranking photos of named entities with high precision, high recall, and diversity , 2010, WSDM '10.

[3]  Peter Fankhauser,et al.  From Web Data to Entities and Back , 2010, CAiSE.

[4]  Dmitri V. Kalashnikov,et al.  Web People Search via Connection Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jihoon Yang,et al.  Ensembles of Region Based Classifiers , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[6]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[7]  Julio Gonzalo,et al.  WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task , 2010, CLEF.

[8]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[9]  Karl Aberer,et al.  It Was Easy, when Apples and Blackberries Were only Fruits , 2010, CLEF.

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  Sang-Bum Kim,et al.  Effective Methods for Improving Naive Bayes Text Classifiers , 2002, PRICAI.

[13]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[14]  Dmitri V. Kalashnikov,et al.  Exploiting context analysis for combining multiple entity resolution systems , 2009, SIGMOD Conference.

[15]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[16]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[17]  Xiaoqing Ding,et al.  Incorporating Generic Learning to Design Discriminative Classifier Adaptable for Unknown Subject in Face Verification , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[18]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[19]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.