A Multi-Word Term Extraction Program for Arabic Language

Terminology extraction commonly includes two steps: identification of term-like units in the texts, mostly multi-word phrases, and the ranking of the extracted term-like units according to their domain representativity. In this paper, we design a multi-word term extraction program for Arabic language. The linguistic filtering performs a morphosyntactic analysis and takes into account several types of variations. The domain representativity is measure thanks to statistical scores. We evalutate several association measures and show that the results we otained are consitent with those obtained for Romance languages.

[1]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[2]  Nina Wacholder,et al.  Spotting and Discovering Terms Through Natural Language Processing , 2003, Information Retrieval.

[3]  Jaana Kekäläinen,et al.  Swedish full text retrieval: Effectiveness of different combinations of indexing strategies with query terms , 2006, Information Retrieval.

[4]  Béatrice Daille,et al.  Variations and application-oriented terminology engineering , 2005 .

[5]  Christopher G. Chute,et al.  A term extraction tool for expanding content in the domain of functioning, disability, and health: proof of concept , 2003, J. Biomed. Informatics.

[6]  M. Teresa Cabré Castellví,et al.  Automatic term detection: A review of current systems , 2001 .

[7]  Sophia Ananiadou,et al.  Identifying Terms by their Family and Friends , 2000, COLING.

[8]  Goran Nenadic,et al.  Automatic Acronym Acquisition and Term Variation Management within Domain-Specific Texts , 2002, LREC.

[9]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[10]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[11]  Munpyo Hong,et al.  Hybrid Filtering for Extraction of Term Candidates from German Technical Texts , 2001 .

[12]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[13]  Hiroshi Nakagawa,et al.  Automatic term recognition based on statistics of compound nouns and their components , 2003 .

[14]  Fidelia Ibekwe-SanJuan,et al.  Application driven Terminology Engineering. , 2007 .

[15]  Archibald Michiels,et al.  DEFI, a tool for automatic multi-word unit recognition, meaning assignment and translation selection , 1998 .

[16]  B. Daille Approche mixte pour l'extraction de terminologie : statistique lexicale et filtres linguistiques , 1994 .

[17]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[18]  M. Hatem Haddad Extraction et impact des connaissances sur les performances des systèmes de recherche d'information , 2002 .

[19]  Frank Smadja,et al.  Xtract: An overview , 1992, Comput. Humanit..