Text analysis for ontology and terminology engineering

After a recent breakthrough in the early 90's, text analysis is acknowledged as one of the promising ways to rapidly build better grounded semantic resources such as terminologies and ontologies. This domain has recently undergone significant evolutions with a massive reference to machine learning algorithms and information extraction techniques together with linguistic- and statistic-based natural language processing. This position paper promotes three main ideas: (i) that highly domain-specific or task-specific, even idiosyncratic ontologies, are very useful, especially when they are linked to broader consensual schemes and they can be built with reasonable effort; (ii) that corpus-based ontologies can capture the perspective of a domain; and (iii) that supervised ontology learning from text makes feasible the development of specialized ontologies adapted for specific uses. We propose the establishment of an inventory of tools for building ontologies from text, give a first classification of such tools, and present an initial review of some recent methods and tools.

[1]  Lee Gillam,et al.  Terminology and the construction of ontology , 2005 .

[2]  Patrice Degoulet,et al.  Terminology extraction from text to build an ontology in surgical intensive care , 2002, AMIA.

[3]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[4]  Didier Bourigault,et al.  UPERY : un outil d’analyse distributionnelle étendue pour la construction d’ontologies à partir de corpus , 2002, JEPTALNRECITAL.

[5]  Raphaël Troncy,et al.  Semantic Commitment for Designing Ontologies: A Proposal , 2002, EKAW.

[6]  Dagobert Soergel,et al.  Indexing languages and thesauri : construction and maintenance , 1974 .

[7]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[8]  Peter Spyns,et al.  Discovering Knowledge in Texts for the learning of DOGMA-inspired ontologies , 2004 .

[9]  Paola Velardi,et al.  Quantitative and Qualitative Evaluation of the OntoLearn Ontology Learning System , 2004, COLING.

[10]  Luuk Matthijssen,et al.  Interfacing Between Lawyers and Computers: An Architecture for Knowledge-based Interfaces to Legal Databases , 1999 .

[11]  Ulrich Güntzer,et al.  Automatic thesaurus construction by machine learning from retrieval sessions , 1989, Inf. Process. Manag..

[12]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[13]  Andrew B. Williams,et al.  Learning to Share Meaning in a Multi-Agent System , 2004, Autonomous Agents and Multi-Agent Systems.

[14]  David Faure,et al.  First experiences of using semantic knowledge learned by ASIUM for information extraction task using INTEX , 2000, ECAI Workshop on Ontology Learning.

[15]  Gilles Bisson,et al.  Designing Clustering Methods for Ontology Building - The Mo'K Workbench , 2000, ECAI Workshop on Ontology Learning.

[16]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[17]  Nicola Guarino,et al.  A Formal Ontology of Properties , 2000, EKAW.

[18]  Trevor J. M. Bench-Capon Luuk Matthijssen: Interfacing between Lawyers and Computers: An Architecture for Knowledge-based Interfaces to Legal Databases. , 2000, Artificial Intelligence and Law.

[19]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[20]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[21]  Dagobert Soergel,et al.  Organizing information - principles of data base and retrieval systems , 1985 .

[22]  Smaranda Muresan,et al.  Evaluation of DEFINDER: a system to mine definitions from consumer-oriented medical text , 2001, JCDL '01.

[23]  P. Séguéla,et al.  Extraction de relations sémantiques entre termes et enrichissement de modèles du domaine , 1999 .

[24]  Yiyu Yao,et al.  Automatic Construction Of Ontology FromText Databases , 2000 .

[25]  Dagobert Soergel,et al.  Automatic and Semi-Automatic Methods as an Aid in the Construction of Indexing Languages and Thesauri , 1974 .

[26]  Sophia Ananiadou,et al.  Statistical measures for terminological extraction , 1995 .

[27]  Alfonso Valencia,et al.  Automatic ontology construction from the literature. , 2002, Genome informatics. International Conference on Genome Informatics.

[28]  M. Teresa Cabré Castellví,et al.  Automatic term detection: A review of current systems , 2001 .

[29]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[30]  Steffen Staab,et al.  Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text , 2004, ECAI.

[31]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[32]  Sylvie Szulman,et al.  TERMINAE: A Linguistic-Based Tool for the Building of a Domain Ontology , 1999, EKAW.

[33]  Henry Small Visualizing science by citation mapping , 1999 .

[34]  Borys Omelayenko,et al.  Learning of Ontologies from the Web: the Analysis of Existent Approaches , 2001, WebDyn@ICDT.

[35]  Frehiwot Fisseha,et al.  Reengineering Thesauri for New Applications: The AGROVOC Example , 2006, J. Digit. Inf..

[36]  Nathalie Aussenac-Gilles,et al.  Revisiting Ontology Design: A Methodology Based on Corpus Analysis , 2000, EKAW.

[37]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.