Extracting Opinions and Facts for Business Intelligence

Finding information about companies on multiple sources on the Web has become increasingly important for business analysts. In particular, since the emergence of the Web 2.0, opinions about companies and their services or products need to be found and distilled in order to create an accurate picture of a business entity. Without appropriate text mining tools, company analysts would have to read hundreds of textual reports, newspaper articles, forums’ postings and manually dig out factual as well as subjective information. This paper describes a series of experiments to assess the value of a number of lexical, morpho-syntactic, and sentiment-based features derived from linguistic processing and from an existing lexical database for the classification of evaluative texts. The paper describes experiments carried out with two different web sources: one source contains positive and negative opinions while the other contains fine grain classifications in a 5-point qualitative scale. The results obtain are positive and in line with current research in the area. Our aim is to use the result of classification in a practical application that will combine factual and opinionated information in order to create the reputation of a business entity.

[1]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[3]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[4]  Tim Leek,et al.  Information Extraction Using Hidden Markov Models , 1997 .

[5]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[6]  Ralph Grishman,et al.  Information Extraction: Techniques and Challenges , 1997, SCIE.

[7]  Kalina Bontcheva,et al.  Adapting SVM for data sparseness and imbalance: a case study in information extraction , 2009, Natural Language Engineering.

[8]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[9]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[10]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[11]  John Shawe-Taylor,et al.  The SVM With Uneven Margins and Chinese Document Categorization , 2003, PACLIC.

[12]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[13]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[14]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[15]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[16]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[17]  Arun Sundararajan,et al.  Opinion Mining using Econometrics: A Case Study on Reputation Systems , 2007, ACL.

[18]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[22]  Kalina Bontcheva,et al.  SVM Based Learning System for F-term Patent Classification , 2007, NTCIR.

[23]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[24]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[25]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[26]  Nancy Chinchor,et al.  MUC-4 evaluation metrics , 1992, MUC.

[27]  David Yarowsky,et al.  Inducing Information Extraction Systems for New Languages via Cross-language Projection , 2002, COLING.

[28]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[29]  Kalina Bontcheva,et al.  Ontology-Based Information Extraction for Business Intelligence , 2007, ISWC/ASWC.

[30]  Kalina Bontcheva,et al.  Cost Sensitive Evaluation Measures for F-term Patent Classification , 2007 .

[31]  Maria Teresa Pazienza,et al.  Information Extraction A Multidisciplinary Approach to an Emerging Information Technology , 1997, Lecture Notes in Computer Science.

[32]  Hamish Cunningham,et al.  Adopting ontologies for multisource identity resolution , 2008, OBI '08.

[33]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[34]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[35]  Michael Gamon,et al.  Automatic Identification of Sentiment Vocabulary: Exploiting Low Association with Known Sentiment Terms , 2005, ACL 2005.

[36]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[37]  John Carroll,et al.  Unsupervised Classification of Sentiment and Objectivity in Chinese Text , 2008, IJCNLP.

[38]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.