Improving Robustness and Flexibility of Concept Taxonomy Learning from Text

The spread and abundance of electronic documents requires automatic techniques for extracting useful information from the text they contain. The availability of conceptual taxonomies can be of great help, but manually building them is a complex and costly task. Building on previous work, we propose a technique to automatically extract conceptual graphs from text and reason with them. Since automated learning of taxonomies needs to be robust with respect to missing or partial knowledge and flexible with respect to noise, this work proposes a way to deal with these problems. The case of poor data/sparse concepts is tackled by finding generalizations among disjoint pieces of knowledge. Noise is handled by introducing soft relationships among concepts rather than hard ones, and applying a probabilistic inferential setting. In particular, we propose to reason on the extracted graph using different kinds of relationships among concepts, where each arc/relationship is associated to a weight that represents its likelihood among all possible worlds, and to face the problem of sparse knowledge by using generalizations among distant concepts as bridges between disjoint portions of knowledge.

[1]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[2]  Brian Davis,et al.  Knowledge Engineering and Knowledge Management , 2012, Lecture Notes in Computer Science.

[3]  Rita Cucchiara,et al.  AI*IA 2009: Emergent Perspectives in Artificial Intelligence, XIth International Conference of the Italian Association for Artificial Intelligence, Reggio Emilia, Italy, December 9-12, 2009, Proceedings , 2009, AI*IA.

[4]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[5]  Krzysztof R. Apt,et al.  Logic Programming , 1990, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[6]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[7]  Luc De Raedt,et al.  Probabilistic Explanation Based Learning , 2007, ECML.

[8]  Luc De Raedt,et al.  On the Efficient Execution of ProbLog Programs , 2008, ICLP.

[9]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[10]  Taisuke Sato,et al.  A Statistical Learning Method for Logic Programs with Distribution Semantics , 1995, ICLP.

[11]  Paola Velardi,et al.  Evaluation of OntoLearn, a Methodology for Automatic Learning of Domain Ontologies , 2005 .

[12]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[13]  Ning Zhong,et al.  Web Intelligence: Research and Development , 2001, Lecture Notes in Computer Science.

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Stefano Ferilli,et al.  Plugging Taxonomic Similarity in First-Order Logic Horn Clauses Comparison , 2011, AI*IA.

[16]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[17]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[18]  Norihiro Ogata A Formal Ontology Discovery from Web Documents , 2001, Web Intelligence.

[19]  Steffen Staab,et al.  Mining Ontologies from Text , 2000, EKAW.

[20]  Stefano Ferilli,et al.  Cooperating Techniques for Extracting Conceptual Taxonomies From Text , 2011 .

[21]  Steffen Staab,et al.  The TEXT-TO-ONTO Ontology Learning Environment , 2000 .