Trouble information extraction based on a bootstrap approach from Twitter

In this paper, we propose a method for extracting trouble information from Twitter. One useful approach is based on machine learning techniques such as SVMs. However, trouble information is a fraction of a percent of all tweets on Twitter. In general, imbalanced distribution is not suitable for machine learning techniques to generate a classifier. Another approach is to extract trouble information by using handwritten rules. However, constructing high coverage rules by handwork is costly. First, we verify these problems in a preliminary experiment. Then, to solve these problems, we apply a bootstrapping method to our trouble information extraction task. We introduce three characteristics and a scoring method to the bootstrapping. As a result, the iteration process on the bootstrapping increased the number of tweets and patterns for trouble information dramatically.

[1]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[2]  Yuji Matsumoto,et al.  Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms , 2008, EMNLP.

[3]  Nasukawa Tetsuya,et al.  Detecting potential issues based on typical problem description , 2011 .

[4]  Giuseppe Carenini,et al.  Interactive multimedia summaries of evaluative text , 2006, IUI '06.

[5]  Kentaro Torisawa,et al.  Looking for Trouble , 2008, COLING.

[6]  Elena Tutubalina,et al.  Clause-Based Approach to Extracting Problem Phrases from User Reviews of Products , 2014, AIST.

[7]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[8]  N. Gupta,et al.  Extracting descriptions of problems with product and services from twitter data , 2011 .

[9]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[10]  Kazutaka Shimada,et al.  Interactive Aspect Summarization Using Word-aspect Relations for Review Documents , 2010 .

[11]  Vladimir Ivanov,et al.  Dictionary-Based Problem Phrase Extraction from User Reviews , 2014, TSD.

[12]  Hiroyuki Sakai,et al.  Extraction of Expressions concerning Accident Cause contained in Articles on Traffic Accidents , 2006 .

[13]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[14]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[15]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[16]  Ellen Riloff,et al.  Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[17]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[18]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19]  Yasunori Kakizawa,et al.  TORISHIKI-KAI, An Autogenerated Web Search Directory , 2008, 2008 Second International Symposium on Universal Communication.

[20]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[23]  Kazutaka Shimada,et al.  On-site Likelihood Identification of Tweets for Tourism Information Analysis , 2012, 2012 IIAI International Conference on Advanced Applied Informatics.

[24]  Jochen L. Leidner,et al.  Hunting for the Black Swan: Risk Mining from Text , 2010, ACL.

[25]  J. Curran,et al.  Minimising semantic drift with Mutual Exclusion Bootstrapping , 2007 .

[26]  Tatsunori Mori,et al.  Novel Approach for Test Methods Automatic Selection in Product Reliability: Improved Method for Acquiring Part-Whole Relation , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[27]  Masatoshi Yoshikawa,et al.  Cause Analysis of New Incidents by Using Failure Knowledge Database , 2012, DEXA.

[28]  Kazuhide Yamamoto,et al.  Extracting Troubles from Daily Reports based on Syntactic Pieces , 2008, PACLIC.