A Japanese Sentence Analyzer

This paper presents the design of a broad-coverage Japanese sentence analyzer which can be part of various Japanese processing systems. The sentence analyzer comprises two components: the lexical analyzer and the syntactic analyzer. Lexical analysis, i.e., segmenting a sentence into words, is a formidable problem for a language like Japanese, because it has no explicit delimiters (blanks) between written words. In practical applications, this task is made more difficult by the occurrence of words not listed in a dictionary. We have developed a five-layered knowledge source and used it successfully in the lexical analyzer, resulting in very accurate segmentation, even in cases where there are unknown words. The syntactic analyzer has two modules: One consists of an augmented context-free grammar and the PLNLP parser; the other is the dependency structure constructor, which converts the phrase structures to dependency structures. The dependency structures represent various key linguistic relations in a more direct way. The dependency structures have semantically important information such as tense, aspect, and modality, as well as preference scores reflecting relative ranking of parse acceptability.

[1]  Charles J. Fillmore,et al.  THE CASE FOR CASE. , 1967 .

[2]  Takao Gunji,et al.  Japanese Phrase Structure Grammar , 1987 .

[3]  John V. Hinds On the status of the VP node in Japanese , 1974 .

[4]  Jonathan Slocum,et al.  The LRC Machine Translation System , 1985, Comput. Linguistics.

[5]  Terry Winograd,et al.  Language as a cognitive process 1: Syntax , 1982 .

[6]  Tetsunosuke Fujisaki,et al.  Kana to Kanji Conversion Text Input of KOTODAMA Document System , 1982 .

[7]  Yorick Wilks,et al.  An intelligent analyzer and understander of English , 1975, Commun. ACM.

[8]  Margaret King,et al.  EUROTRA: A Multilingual System under Development , 1985, Comput. Linguistics.

[9]  Geoffrey K. Pullum,et al.  Generalized phrase structure grammar : a theoretical synopsis , 1982 .

[10]  Ralph Grishman,et al.  Grammatically-based automatic word class formation , 1975, Inf. Process. Manag..

[11]  Hajime Hoji,et al.  Weak crossover and move α in Japanese , 1983 .

[12]  George E. Heidorn,et al.  Natural language inputs to a simulation programming system: An introduction , 1971 .

[13]  J. Kimball Seven principles of surface structure parsing in natural language , 1973 .

[14]  Makoto Nagao,et al.  Data-Structure of a Large Japanese Dictionary and Morphological Analysis by Using It , 1978 .

[15]  Zellig S. Harris,et al.  A Grammar of English on Mathematical Principles , 1982 .

[16]  Jun'ichi Tsujii,et al.  The Japanese Government Project for Machine Translation , 1985, Comput. Linguistics.

[17]  D. G. Hays Dependency Theory: A Formalism and Some Observations , 1964 .

[18]  Tetsuro Nishino,et al.  CRITAC - An Experimental System for Japanese Text Proofreading , 1988, IBM J. Res. Dev..

[19]  Janet D. Fodor,et al.  The sausage machine: A new two-stage parsing model , 1978, Cognition.

[20]  Taijiro Tsutsumi A Prototype English-Japanese Machine Translation System for Translating IBM Computer Manuals , 1986, COLING.

[21]  Hirosato Nomura,et al.  Translation by Understanding: A Machine Translation System LUTE , 1986, COLING.