论文信息 - A lexicon for Vietnamese language processing

A lexicon for Vietnamese language processing

Only very recently have Vietnamese researchers begun to be involved in the domain of Natural Language Processing (NLP). As there does not exist any published work in formal linguistics nor any recognizable standard for Vietnamese word definition and word categories, the fundamental tasks for automatic Vietnamese language processing, such as part-of-speech tagging, parsing, etc., are very difficult tasks for computer scientists. The fact that all necessary linguistic resources have to be built from scratch by each research team is a real obstacle to the development of Vietnamese language processing. The aim of our projects is thus to build a common linguistic database that is freely and easily exploitable for the automatic processing of Vietnamese. In this paper, we present our work on creating a Vietnamese lexicon for NLP applications. We emphasize the standardization aspect of the lexicon representation. We especially propose an extensible set of Vietnamese syntactic descriptions that can be used for tagset definition and morphosyntactic analysis. These descriptors are established in such a way as to be a reference set proposal for Vietnamese in the context of ISO subcommittee TC 37/SC 4 (Language Resource Management).

[1] Laurent Romary,et al. Encoding Syntactic Annotation , 2003 .

[2] Nancy Ide,et al. MULTEXT: Multilingual Text Tools and Corpora , 1994, COLING.

[3] Thi Minh Huyen Nguyen. Outils et ressources linguistiques pour l'alignement de textes multilingues français-vietnamiens , 2006 .

[4] Gil Francopoulo,et al. Standards going concrete : from LMF to Morphalou , 2004, COLING 2004.

[5] Nguyen Van Toan,et al. Vietnamese Word Segmentation , 2001, NLPRS.

[6] Nancy Ide,et al. Encoding dictionaries , 1995, Comput. Humanit..

[7] Nancy Ide,et al. Standards for Language Resources , 2002, LREC.

[8] Kiem Hoang,et al. POS-Tagger for English-Vietnamese Bilingual Corpus , 2003, ParallelTexts@NAACL-HLT.

[9] Jean Véronis,et al. Text Encoding Initiative , 1995, Springer Netherlands.

[10] Jean Véronis,et al. Text Encoding Initiative: Background and Contexts , 1995 .

[11] Charles N. Li,et al. Subject and topic , 1979 .

[12] Laurent Romary,et al. Une étude de cas pour l'étiquetage morpho-syntaxique de textes vietnamiens , 2003 .