Mining personal experiences and opinions from Web documents

This paper proposes a new UGC-oriented language technology application, which we call experience mining. Experience mining aims at automatically collecting instances of personal experiences as well as opinions from vast amounts of user generated content (UGC) such as weblog and forum posts and storing them in an experience database with semantically rich indices. After discussing the technical issues relating to this new task, we focus on the central problem of factuality analysis, formulate a task definition, and propose a machine learning-based solution. Our empirical evaluation indicates that our factuality analysis defintion is sufficiently well-defined to achieve a high inter-annotator agreement and our Factorial CRF-based model considerably outperforms the baseline. We also present an application system, which currently stores over 50M experience instances extracted from 150M Japanese blog posts with semantic indices and serves an experience search engine for unrestricted users and report on our empirical evaluation of the system's accuracy.

[1]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[2]  Kentaro Inui,et al.  Experience Mining: Building a Large-Scale Database of Personal Experiences and Opinions from Web Documents , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]  Claire Cardie,et al.  Topic Identification for Fine-Grained Opinion Analysis , 2008, COLING.

[4]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[5]  小林 のぞみ Opinion mining from Web documents : extraction and structurization , 2007 .

[6]  George Hripcsak,et al.  A temporal constraint structure for extracting temporal information from clinical narrative , 2006, J. Biomed. Informatics.

[7]  Heinz M Goldmann How to Win Customers , 1966 .

[8]  Claire Cardie,et al.  Joint Extraction of Entities and Relations for Opinion Recognition , 2006, EMNLP.

[9]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[10]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[11]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[12]  Yuji Matsumoto,et al.  Opinion Extraction Using a Learning-Based Anaphora Resolution Technique , 2005, IJCNLP.

[13]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[14]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[15]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[16]  Daisuke Kawahara,et al.  A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis , 2006, HLT-NAACL.

[17]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[18]  James Pustejovsky,et al.  Determining Modality and Factuality for Text Entailment , 2007 .

[19]  Andrea Esuli,et al.  Determining Term Subjectivity and Term Orientation for Opinion Mining , 2006, EACL.

[20]  Hiroshi Kanayama,et al.  Deeper Sentiment Analysis Using Machine Translation Technology , 2004, COLING.

[21]  Satoshi Sekine,et al.  On-Demand Information Extraction , 2006, ACL.

[22]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[23]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[24]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[25]  Eduard Hovy,et al.  Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text , 2006 .

[26]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[27]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[28]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[29]  E. Strong,et al.  Theories of selling. , 1925 .