论文信息 - Trouble information extraction based on a bootstrap approach from Twitter

Trouble information extraction based on a bootstrap approach from Twitter

In this paper, we propose a method for extracting trouble information from Twitter. One useful approach is based on machine learning techniques such as SVMs. However, trouble information is a fraction of a percent of all tweets on Twitter. In general, imbalanced distribution is not suitable for machine learning techniques to generate a classifier. Another approach is to extract trouble information by using handwritten rules. However, constructing high coverage rules by handwork is costly. First, we verify these problems in a preliminary experiment. Then, to solve these problems, we apply a bootstrapping method to our trouble information extraction task. We introduce three characteristics and a scoring method to the bootstrapping. As a result, the iteration process on the bootstrapping increased the number of tweets and patterns for trouble information dramatically.

Kazutaka Shimada | Kohei Kurihara | Kazutaka Shimada | Kohei Kurihara

[1] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[2] Yuji Matsumoto,et al. Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms , 2008, EMNLP.

[3] Nasukawa Tetsuya,et al. Detecting potential issues based on typical problem description , 2011 .

[4] Giuseppe Carenini,et al. Interactive multimedia summaries of evaluative text , 2006, IUI '06.

[5] Kentaro Torisawa,et al. Looking for Trouble , 2008, COLING.

[6] Elena Tutubalina,et al. Clause-Based Approach to Extracting Problem Phrases from User Reviews of Products , 2014, AIST.

[7] Patrick Pantel,et al. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[8] N. Gupta,et al. Extracting descriptions of problems with product and services from twitter data , 2011 .

[9] Ben Shneiderman,et al. Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[10] Kazutaka Shimada,et al. Interactive Aspect Summarization Using Word-aspect Relations for Review Documents , 2010 .

[11] Vladimir Ivanov,et al. Dictionary-Based Problem Phrase Extraction from User Reviews , 2014, TSD.

[12] Hiroyuki Sakai,et al. Extraction of Expressions concerning Accident Cause contained in Articles on Traffic Accidents , 2006 .

[13] Doug Downey,et al. Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[14] Ellen Riloff,et al. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[15] Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[16] Ellen Riloff,et al. Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[17] Mizuki Morita,et al. Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[18] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19] Yasunori Kakizawa,et al. TORISHIKI-KAI, An Autogenerated Web Search Directory , 2008, 2008 Second International Symposium on Universal Communication.

[20] Lillian Lee,et al. Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[21] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22] Yutaka Matsuo,et al. Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[23] Kazutaka Shimada,et al. On-site Likelihood Identification of Tweets for Tourism Information Analysis , 2012, 2012 IIAI International Conference on Advanced Applied Informatics.

[24] Jochen L. Leidner,et al. Hunting for the Black Swan: Risk Mining from Text , 2010, ACL.

[25] J. Curran,et al. Minimising semantic drift with Mutual Exclusion Bootstrapping , 2007 .

[26] Tatsunori Mori,et al. Novel Approach for Test Methods Automatic Selection in Product Reliability: Improved Method for Acquiring Part-Whole Relation , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[27] Masatoshi Yoshikawa,et al. Cause Analysis of New Incidents by Using Failure Knowledge Database , 2012, DEXA.

[28] Kazuhide Yamamoto,et al. Extracting Troubles from Daily Reports based on Syntactic Pieces , 2008, PACLIC.