论文信息 - Language Independent Morphological Analysis - 字舞流文

Language Independent Morphological Analysis

This paper proposes a framework of language independent morphological analysis and mainly concentrate on tokenization, the first process of morphological analysis. Although tokenization is usually not regarded as a difficult task in most segmented languages such as English, there are a number of problems in achieving precise treatment of lexical entries. We first introduce the concept of morpho-fragments, which are intermediate units between characters and lexical entries. We describe our approach to resolve problems arising in tokenization so as to attain a language independent morphological analyzer.

Yuji Matsumoto | Tatsuo Yamashita | Yuji Matsumoto | Tatsuo Yamashita

[1] Donald E. Knuth,et al. The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[2] Venkata Subramaniam,et al. Information Retrieval: Data Structures & Algorithms , 1992 .

[3] Ricardo Baeza-Yates,et al. Information Retrieval: Data Structures and Algorithms , 1992 .

[4] Donald E. Knuth,et al. The art of computer programming: sorting and searching (volume 3) , 1973 .

[5] Beatrice Santorini,et al. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[6] Yuji Matsumoto,et al. A Proposal of Korean Conjugation System and its Application to Morphological Analysis , 1996, PACLIC.

[7] Christopher J. Fox,et al. Lexical Analysis and Stoplists , 1992, Information Retrieval: Data Structures & Algorithms.

[8] Chunyu Kit,et al. Tokenization as the Initial Phase in NLP , 1992, COLING.

[9] Masaaki Nagata,et al. A Stochastic Japanese Morphological Analyzer Using a Forward-DP Backward-A* N-Best Search Algorithm , 1994, COLING.

[10] Jin Guo,et al. Critical Tokenization and its Properties , 1997, Comput. Linguistics.

[11] Yuji Matsumoto,et al. Japanese Morphological Analysis System ChaSen version 2.0 Manual , 1999 .

[12] 刘江雪,et al. LIN volume 11 issue 2 Cover and Back matter , 1975, Journal of Linguistics.

[13] Masaaki Nagata. A Part of Speech Estimation Method for Japanese Unknown Words using a Statistical Model of Morphology and Context , 1999, ACL.

[14] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15] H. Kucera,et al. Computational analysis of present-day American English , 1967 .

[16] Edward Fredkin,et al. Trie memory , 1960, Commun. ACM.

[17] Donald E. Knuth,et al. The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[18] Jon Mills. Lexicon Based Critical Tokenisation: An Algorithm , 1998 .

[19] Marti A. Hearst,et al. Adaptive Multilingual Sentence Boundary Disambiguation , 1997, CL.