In Japanese, case structure analysis is very imt)ortant to handle several t roublesome characteristics of Japanese snch as scrambling, onfission of ease components, mid disappearance of case markers. However, fi)r lack of a widecoverage ease frame dictionary, it has been difficult to perfornl case structure analysis accurat;ely. Although several methods to construct a ease fl'mne dictionary from analyzed corpora have been proposed, they cannot avoid data sparseness 1)rol)lem. This paper proposes an unsupervised method of constructing a case frame dictionary from an enormous raw corpus by using a robust and accurate parser. It also prorides a case structure analysis method based on the constructed dictionary. 1 I n t r o d u c t i o n Syntactic analysis, or parsing has been a main objective in Natural Language Processing. In case of Jat)anese , however, syntactic analysis cannot clarify relations between words ill sentences because of several t roublesome characteristics of Japanese such as scrambling, omission of case components, and disappearance of case markers. Therefore, in Japanese sentence analysis, case s tructure analysis is an important issue, and a case frame dictionary is necessary for the analysis. Some research institutes have constructed Japanese case frmne dictiouaries manually (Ikehara et al., 1997; Infbrmation-Technology Promotion Agency, Japan, 1987). However, it is quite expensive, or almost impossible to construct a wide-coverage ease fl'anm dictionary by hand. Others have tried to construct a case fl'mne dictionary automatical ly from analyzed corpora (Utsuro et al., 1998). However, existing syntactically analyzed corpora are too small to learn a dictionary, since case fl'ame iuformation consists of relations between nouns and verbs, which rnultiplies to millions of combinations. Based on such a consideration, we took the fbllowing unsupervised learning strategy to the .Japanese case structure analysis: 1. At first, a robust and accurate parser is developed, which does not utilize a case fl'mne dictionary, 2. a very large corI)us is parsed by the parser, 3. reliable noun-verb relations are extracted from the parse results, and a case frmne dict ionary is constructed from them, and 4. the dictionary is utilized for case structure analysis. 2 Characteristics of Japanese language and necessity of case s t r u c t u r e a n a l y s i s In Japanese, postposit ions function as case markers ( ( M s ) mid a verb is final in a sentence. The basic s tructure of a Japanese sentence is as fbllows: (1) k a t e 9a coat wo ki~'u. he nominative-CM coat accusative-CM wear (lie wears a coat) A clause modifier is left to the modified noun as follows: (2) k a t e 9 a k i t e i r u coat lie nom-CM wear coat (the coat he wears) The modified noun followed by a postposit ion then becomes a case component of a matrix verb. The typical s tructure of a Japanese complex sentence is as fbllows:
[1]
Makoto Nagao,et al.
A Syntactic Analysis Method of Long Japanese Sentences Based on the Detection of Conjunctive Structures
,
1994,
CL.
[2]
Yuji Matsumoto,et al.
General-to-Specific Model Selection for Subcategorization Preference
,
1998,
COLING-ACL.
[3]
Ted Briscoe,et al.
Automatic Extraction of Subcategorization from Corpora
,
1997,
ANLP.
[4]
Makoto Nagao,et al.
Building a Japanese parsed corpus while improving the parsing system
,
1997
.
[5]
Christopher D. Manning.
Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora
,
1993,
ACL.