Concept relation extraction using Naïve Bayes classifier for ontology-based question answering systems

Domain ontology is used as a reliable source of knowledge in information retrieval systems such as question answering systems. Automatic ontology construction is possible by extracting concept relations from unstructured large-scale text. In this paper, we propose a methodology to extract concept relations from unstructured text using a syntactic and semantic probability-based Naive Bayes classifier. We propose an algorithm to iteratively extract a list of attributes and associations for the given seed concept from which the rough schema is conceptualized. A set of hand-coded dependency parsing pattern rules and a binary decision tree-based rule engine were developed for this purpose. This ontology construction process is initiated through a question answering process. For each new query submitted, the required concept is dynamically constructed, and ontology is updated. The proposed relation extraction method was evaluated using benchmark data sets. The performance of the constructed ontology was evaluated using gold standard evaluation and compared with similar well-performing methods. The experimental results reveal that the proposed approach can be used to effectively construct a generic domain ontology with higher accuracy. Furthermore, the ontology construction method was integrated into the question answering framework, which was evaluated using the entailment method.

[1]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[4]  José Luis Vicedo González,et al.  Addressing ontology-based question answering with collections of user queries , 2009, Inf. Process. Manag..

[5]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[6]  Nathalie Aussenac-Gilles,et al.  The TERMINAE Method and Platform for Ontology Engineering from Texts , 2008, Ontology Learning and Population.

[7]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[8]  Oren Etzioni,et al.  Identifying Functional Relations in Web Text , 2010, EMNLP.

[9]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[10]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[11]  Jayant Krishnamurthy,et al.  Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.

[12]  Marek Hatala,et al.  Towards open ontology learning and filtering , 2011, Inf. Syst..

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Ian Horrocks,et al.  Ontologies and the semantic web , 2008, CACM.

[15]  Sung-Hyon Myaeng,et al.  Automatic construction of a large-scale situation ontology by mining how-to instructions from the web , 2010, J. Web Semant..

[16]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[17]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[18]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[19]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[20]  Aldo Gangemi,et al.  Ontology Learning and Its Application to Automated Terminology Translation , 2003, IEEE Intell. Syst..

[21]  Ido Dagan,et al.  Global Learning of Typed Entailment Rules , 2011, ACL.

[22]  Sung-Hyon Myaeng,et al.  A Theme Allocation for a Sentence Based on Head Driven Patterns , 2005, TSD.

[23]  Noboru Takagi An Application of Binary Decision Trees to Pattern Recognition , 2006, J. Adv. Comput. Intell. Intell. Informatics.

[24]  Sung-Hyon Myaeng,et al.  Theme Assignment for Sentences Based on Head-Driven Patterns , 2006, IEICE Trans. Inf. Syst..

[25]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[26]  Enrico Motta,et al.  AquaLog: An ontology-driven question answering system for organizational semantic intranets , 2007, J. Web Semant..

[27]  Ralph Grishman,et al.  Machine Learning of Extraction Patterns from Unannotated Corpora: Position Statement , 2000 .

[28]  Soh-Khim Ong,et al.  GRAONTO: A graph-based approach for automatic construction of domain ontology , 2011, Expert Syst. Appl..

[29]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[30]  Peter Sawyer,et al.  A flexible framework to experiment with ontology learning techniques , 2008, Knowl. Based Syst..

[31]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[32]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[33]  Imran Sarwar Bajwa,et al.  Translating natural language constraints to OCL , 2012, J. King Saud Univ. Comput. Inf. Sci..

[34]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[35]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[36]  Deborah L. McGuinness Question Answering on the Semantic Web , 2004, IEEE Intell. Syst..

[37]  Oren Etzioni,et al.  Learning First-Order Horn Clauses from Web Text , 2010, EMNLP.

[38]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[39]  N. Ţăndăreanu,et al.  Conditional graphs generated by conditional schemas , 2009 .

[40]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[41]  Zhang Peng,et al.  The Community Structure of Scientific Collaboration Network , 2005 .

[42]  Ramdane Maamri,et al.  Unexpected rules using a conceptual distance based on fuzzy ontology , 2014, J. King Saud Univ. Comput. Inf. Sci..

[43]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[44]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[45]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[46]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.