A symbolic and surgical acquisition of terms through variation

Terminological acquisition is an important issue in learning for Natural Language Processing (NLP) due to the constant terminological renewal through technological changes. Terms play a key role in several NLP-activities such as machine translation, automatic indexing or text understanding. In opposition to classical once-and-for-all approaches, this paper proposes an incremental process for terminological enrichment which operates on existing reference lists and large corpora. Candidate terms are acquired by extracting variants of reference terms through FASTR (FAst Syntactic Term Recognizer), a unification-based partial parser. As acquisition is performed within specific morphosyntactic contexts (coordinations, insertions or permutations of complex nominals), rich conceptual links are learned together with candidate terms. A clustering of terms related through coordinations yields classes of conceptually close terms while graphs resulting from insertions denote generic/specific relations. A graceful degradation of the volume of acquisition on partial initial lists confirms the robustness of the method to incomplete data.