OpenEval: Web Information Query Evaluation

In this paper, we investigate information validation tasks that are initiated as queries from either automated agents or humans. We introduce OpenEval, a new online information validation technique, which uses information on the web to automatically evaluate the truth of queries that are stated as multiargument predicate instances (e.g., DrugHasSideEffect(Aspirin, GI Bleeding))). OpenEval gets a small number of instances of a predicate as seed positive examples and automatically learns how to evaluate the truth of a new predicate instance by querying the web and processing the retrieved unstructured web pages. We show that OpenEval is able to respond to the queries within a limited amount of time while also achieving high F1 score. In addition, we show that the accuracy of responses provided by OpenEval is increased as more time is given for evaluation. We have extensively tested our model and shown empirical results that illustrate the effectiveness of our approach compared to related techniques.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[3]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[4]  Steffen Staab,et al.  Gimme' the context: context-driven automatic semantic annotation with C-PANKOW , 2005, WWW '05.

[5]  Kevin Chen-Chuan Chang,et al.  Searching patterns for relation extraction over the web: rediscovering the pattern-relation duality , 2011, WSDM '11.

[6]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[7]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[8]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[9]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[10]  Satoshi Sekine,et al.  On-Demand Information Extraction , 2006, ACL.

[11]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[12]  Bernardo Magnini,et al.  Is It the Right Answer? Exploiting Web Redundancy for Answer Validation , 2002, ACL.

[13]  Oren Etzioni,et al.  Machine Reading at the University of Washington , 2010, HLT-NAACL 2010.

[14]  Marjorie Freedman,et al.  Empirical Studies in Learning to Read , 2010, HLT-NAACL 2010.

[15]  Oren Etzioni,et al.  The use of web-based statistics to validate, information extraction , 2004, AAAI 2004.

[16]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[17]  Doug Downey,et al.  Analysis of a probabilistic model of redundancy in unsupervised information extraction , 2010, Artif. Intell..

[18]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[19]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[20]  Michael J. Cafarella,et al.  Ontology-Driven Information Extraction with OntoSyphon , 2006, SEMWEB.

[21]  Johanna Völker,et al.  Towards large-scale, open-domain and ontology-based named entity classification , 2005 .

[22]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[23]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[24]  J. Gani,et al.  Progress in statistics , 1975 .

[25]  Manuela M. Veloso,et al.  Enabling robots to find and fetch objects by querying the web , 2012, AAMAS.

[26]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[27]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[28]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[29]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[30]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[31]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[32]  Manuela M. Veloso,et al.  Using the Web to Interactively Learn to Find Objects , 2012, AAAI.

[33]  Ahmet Uyar,et al.  Investigation of the accuracy of search engine hit counts , 2009, J. Inf. Sci..

[34]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[35]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[36]  Zhu Zhang,et al.  Weakly-supervised relation classification for information extraction , 2004, CIKM '04.

[37]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.

[38]  Preslav Nakov,et al.  Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing , 2005, CoNLL.