Extraction and Approximation of Numerical Attributes from the Web

We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Kazuhiko Ohe,et al.  UTH: SVM-based Semantic Relation Classification using Physical Sizes , 2007, SemEval@ACL.

[3]  Peter D. Turney Measuring Semantic Similarity by Latent Relational Analysis , 2005, IJCAI.

[4]  Ari Rappoport,et al.  Classification of Semantic Relationships between Nominals Using Pattern Clusters , 2008, ACL.

[5]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[6]  Dominic Widdows,et al.  A Graph Model for Unsupervised Lexical Acquisition , 2002, COLING.

[7]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[8]  Dan I. Moldovan,et al.  Automatic Discovery of Part-Whole Relations , 2006, CL.

[9]  Veronique Moriceau Numerical Data Integration for Cooperative Question-Answering , 2006 .

[10]  Ari Rappoport,et al.  Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions , 2008, ACL.

[11]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[12]  John M. Prager,et al.  Open-Domain Question-Answering , 2007, Found. Trends Inf. Retr..

[13]  Somnath Banerjee,et al.  Learning to rank for quantity consensus queries , 2009, SIGIR.

[14]  Eric Crestan,et al.  Web-scale knowledge extraction from semi-structured tables , 2010, WWW '10.

[15]  Ronen Feldman,et al.  Clustering for unsupervised relation identification , 2007, CIKM '07.

[16]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[17]  Ari Rappoport,et al.  Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words , 2006, ACL.

[18]  Ari Rappoport,et al.  Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining , 2007, ACL.

[19]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[20]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[21]  Patrick Pantel,et al.  VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations , 2004, EMNLP.