Question Answering Towards Automatic Augmentations of Ontology Instances

Ontology instances are typically stored as triples which associate two named entities with a pre-defined relational description. Sometimes such triples can be incomplete in that one entity is known but the other entity is missing. The automatic discovery of the missing values is closely related to relation extraction systems that extract binary relations between two identified entities. Relation extraction systems rely on the availability of accurately named entities in that mislabelled entities can decrease the number of relations correctly identified. Although recent results demonstrate over 80% accuracy for recognising named entities, when input texts have less consistent patterns, the performance decreases rapidly. This paper presents OntotripleQA which is the application of question-answering techniques to relation extraction in order to reduce the reliance on the named entities and take into account other assessments when evaluating potential relations. Not only does this increase the number of relations extracted, but it also improves the accuracy of extracting relations by considering features which are not extractable with only comparisons of the named entities. A small dataset was collected to test the proposed approach and the experiment demonstrates that it is effective on sentences from Web documents with an accuracy of 68% on average.

[1]  James S. Aitken Learning Information Extraction Rules: An Inductive Logic Programming approach , 2002, ECAI.

[2]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[3]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[4]  David E. Millard,et al.  Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web , 2002, SAAKM@ECAI.

[5]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[6]  Tom Hampton,et al.  SRA: Description of the IE2 System Used for MUC-7 , 1998, MUC.

[7]  Dan Roth,et al.  Probabilistic Reasoning for Entity & Relation Recognition , 2002, COLING.

[8]  Bernardo Magnini,et al.  Mining Knowledge from Repeated Co-Occurrences: DIOGENE at TREC 2002 , 2002, TREC.

[9]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[10]  Kenneth C. Litkowski Question-Answering Using Semantic Relation Triples , 1999, TREC.

[11]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[12]  Elaine Marsh,et al.  MUC-7 Evaluation of IE Technology: Overview of Results , 1998, MUC.

[13]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[14]  Mila Ramos-Santacruz,et al.  REES: A Large-Scale Relation and Event Extraction System , 2000, ANLP.

[15]  Charles L. A. Clarke,et al.  Statistical Selection of Exact Answers (MultiText Experiments for TREC 2002) , 2002, TREC.

[16]  Daniel Marcu,et al.  Natural Language Based Reformulation Resource and Wide Exploitation for Question Answering , 2002, TREC.

[17]  Paul H. Lewis,et al.  The Impact of Enriched Linguistic Annotation on the Performance of Extracting Relation Triples , 2004, CICLing.

[18]  Ralph Grishman,et al.  A Corpus-based Probabilistic Grammar with Only Two Non-terminals , 1995, IWPT.

[19]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.