NLP for Term Variant Extraction: Synergy Between Morphology, Lexicon, and Syntax

We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to the French language; it is trained on newspaper articles and tested on scientific literature.

[1]  Evelyne Tzoukermann,et al.  Effective use of natural language processing techniques for automatic conflation of multi-word terms: the role of derivational morphology, part of speech tagging, and shallow parsing , 1997, SIGIR '97.

[2]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[3]  Christoph Schwarz,et al.  Automatic syntactic analysis of free text , 1990, J. Am. Soc. Inf. Sci..

[4]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[5]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[6]  Andy Lauriston Automatic recognition of complex terms: Problems and the TERMINO solution , 1994 .

[7]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[8]  Ellen Riloff,et al.  Little words can make a big difference for text classification , 1995, SIGIR '95.

[9]  Christian Jacquemin A symbolic and surgical acquisition of terms through variation , 1995, Learning for Natural Language Processing.

[10]  Alan F. Smeaton,et al.  The Application of Morpho-Syntactic Language Processing to Effective Phrase Matching , 1992, Inf. Process. Manag..

[11]  Claude Hagège,et al.  Dictionnaire du français , 1987 .

[12]  Donna Harman,et al.  How effective is suffixing , 1991 .

[13]  Barr and Feigenbaum Edward A. Avron,et al.  The Handbook of Artificial Intelligence , 1981 .

[14]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[15]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[16]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[17]  Danielle Corbin Morphologie dérivationnelle et structuration du lexique , 1987 .

[18]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[19]  Christian Jacquemin,et al.  Guessing morphology from terms and corpora , 1997, SIGIR '97.

[20]  Chris D. Paice,et al.  Method for Evaluation of Stemming Algorithms Based on Error Counting , 1996, J. Am. Soc. Inf. Sci..

[21]  Christian Jacquemin,et al.  Retrieving terms and their variants in a lexicalized unification-based framework , 1994, SIGIR '94.

[22]  Aravind K. Joshi,et al.  Parsing Strategies with ‘Lexicalized’ Grammars: Application to Tree Adjoining Grammars , 1988, COLING.

[23]  Martin Dillon,et al.  FASIT: A fully automatic syntactically based indexing system , 1983, J. Am. Soc. Inf. Sci..

[24]  Fred Karlsson,et al.  Constraint Grammar as a Framework for Parsing Running Text , 1990, COLING.

[25]  Tomek Strzalkowski,et al.  Information Retrieval Using Robust Natural Language Processing , 1992, HLT.

[26]  Kimmo Koskenniemi,et al.  Finite state morphology and information retrieval , 1996, Natural Language Engineering.

[27]  Kenneth Ward Church One term or two? , 1995, SIGIR '95.

[28]  Evelyne Tzoukermann,et al.  Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.

[29]  Mark Liberman,et al.  A Finite-State Morphological Processor For Spanish , 1990, COLING.

[30]  Stuart M. Shieber,et al.  An Introduction to Unification-Based Approaches to Grammar , 1986, CSLI Lecture Notes.

[31]  Mark Aronoff,et al.  Word Formation in Generative Grammar , 1979 .

[32]  Aravind K. Joshi,et al.  34th Annual Meeting of the Association for Computational Linguistics , 1996 .

[33]  Howard B. Garey,et al.  Esquisse d'une syntaxe structurale , 1954 .

[34]  H. Marchand Categories And Types Of Present Day English Word Formation , 1971 .

[35]  Christian Jacquemin,et al.  What is the Tree that we see Through the Window: A Linguistic Approach to Windowing and Term Variation , 1996, Inf. Process. Manag..

[36]  Didier Bourigault,et al.  An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation , 1993, EACL.

[37]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[38]  Richard Sproat,et al.  Morphology and computation , 1992 .

[39]  Kevin Durkin,et al.  Counting on you but depending on me: parent-child interaction and number-word acquisition , 1986 .

[40]  Peter Willett,et al.  The effectiveness of stemming for natural‐language access to Slovene textual data , 1992 .

[41]  B. Courtois,et al.  Un système de dictionnaires électroniques pour les mots simples du français , 1990 .

[42]  Janice Moulton,et al.  The organization of language , 1981 .

[43]  Dragomir R. Radev,et al.  Using Word Class for Part-of-speech Disambiguation , 1996, VLC@COLING.

[44]  Peter Willett,et al.  An evaluation of some conflation algorithms for information retrieval , 1981 .

[45]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[46]  Antonio Zampolli,et al.  Computational approaches to the lexicon , 1994 .

[47]  Christian Jacquemin Recycling Terms into a Partial Parser , 1994, ANLP.

[48]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[49]  Fernando Pereira,et al.  Weighted Rational Transductions and their Application to Human Language Processing , 1994, HLT.

[50]  Richard Sproat,et al.  An Efficient Compiler for Weighted Rewrite Rules , 1996, ACL.

[51]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[52]  David Clemenceau Structuration du lexique et reconnaissance de mots derives , 1993 .