Building domain specific lexical hierarchies from corpora

In this article, we present a new algorithm for building domain specific lexical hierarchies from texts. The basic elements of such a hierarchy are the normalized terms - mono and multi-word terms - extracted from a large corpus by a terminological extractor. The algorithm relies on collocations for representing the meaning of these terms, finding hierarchical relations between them and finally, organizing them into a hierarchy. Moreover, it takes into account the polysemy of terms while it builds the hierarchy. We also present the results of its application on a part of the corpus designed for the ARC A3 of the Francil network and we go through its possible applications.

[1]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[2]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[3]  H. Assadi,et al.  Construction d'ontologies a partir de textes techniques - application aux systemes documentaires , 1998 .

[4]  B. Michelet L' analyse des associations , 1988 .

[5]  P. Séguéla,et al.  Extraction de relations sémantiques entre termes et enrichissement de modèles du domaine , 1999 .

[6]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[7]  Christophe Jouis Contributions à la conceptualisation et à la Modélisation des connaissances à partir d'une analyse linguistique de textes : réalisation d'un prototype : le système SEEK , 1993 .

[8]  Brigitte Grau,et al.  A Cross-Comparison of Two Clustering Methods , 2001, ACL 2001.

[9]  Thierry Hamon,et al.  A Step towards the Detection of Semantic Variants of Terms in Technical Documents , 1998, COLING-ACL.

[10]  Widad Mustafa El Hadi,et al.  The ARC A3 Project: Terminology Acquisition Tools: Evaluation Method and Task , 2001 .

[11]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[12]  Christiane Fellbaum,et al.  Using Wordnet for Text Retrieval , 1998 .

[13]  Piek T. J. M. Vossen,et al.  Introduction to EuroWordNet , 1998, Comput. Humanit..

[14]  Adeline Nazarenko,et al.  Symbolic word clustering for medium-size corpora , 1996, COLING.

[15]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.