In this work, we contribute a method that takes advantage of the powerful corpus of the Web data to automatically evaluate the truth of propositions that are stated as multiargument instantiated predicates, e.g., City In Country (Beijing,China). Our approach, OpenEval, automatically converts a given instantiated predicate into a Web search query, then extracts a corresponding set of features from the web pages returned. Initially, OpenEval trains a classifier on a list of predicates by using a set of seed positive examples for each predicate. Each such set furthermore provides negative examples for the other predicates. To evaluate a new query, OpenEval again converts the query into a corresponding set of features extracted from the Web. The extracted features are then used as input to the learned classifier. The classifier output is used to calculate the correctness probability of the input predicate. We experimentally show that OpenEval is significantly superior to the previous related techniques, in particular the Pointwise Mutual Information (PMI) and Never-Ending Language Learner (NELL).
[1]
Ahmet Uyar,et al.
Investigation of the accuracy of search engine hit counts
,
2009,
J. Inf. Sci..
[2]
Oren Etzioni,et al.
Open Information Extraction from the Web
,
2007,
CACM.
[3]
Doug Downey,et al.
Web-scale information extraction in knowitall: (preliminary results)
,
2004,
WWW '04.
[4]
Peter D. Turney.
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
,
2001,
ECML.
[5]
Estevam R. Hruschka,et al.
Coupled semi-supervised learning for information extraction
,
2010,
WSDM '10.
[6]
Bernardo Magnini,et al.
Is It the Right Answer? Exploiting Web Redundancy for Answer Validation
,
2002,
ACL.
[7]
Oren Etzioni,et al.
The use of web-based statistics to validate, information extraction
,
2004,
AAAI 2004.