论文信息 - Corpus-based acquisition of head noun countability features

Corpus-based acquisition of head noun countability features

In recent years, significant advances have been made in the use of corpora as tools in language processing. Lexical acquisiton techinques have been somewhat successful in learning verb subcategorization information. Yet much of the other information available from corpora has not been harnessed. The countability property of nouns is one property that would be useful to acquire. Such information could help in word sense disambiguation, in determining appropriate determiners during generation (especially in the case of machine translation), and as a lexicographic resource during dictionary construction. Existing lexical resources which include countability features of nouns have been created largely by hand. Manual tagging of noun countability is expensive in terms of time and labor. It is difficult to extend such resources as new terminology emerges. This thesis presents a method of automatically acquiring countability properties of head nouns. This information is gathered from a part-of-speech tagged corpus, specifically the British National Corpus (BNC). Basic noun phrase chunking is performed on the corpus to obtain head nouns and their accompanying determiner, if any. Highreliability grammatical cues are used to automatically tag head noun tokens as either count or non-count. This method relies heavily on the grammatical role determiners play in the countability of head nouns. This thesis demonstrates that the method used is both grammatically sound and successful, showing an improvement over the baseline. The automatic countability tagger can correctly tag nouns with countability in up to 87% of noun phrases.

Lane Schwartz | Lane Schwartz

[1] Ann Copestake,et al. Computational lexical semantics: The representation of group denoting nouns in a lexical knowledge base , 1995 .

[2] Cristina Schmitt,et al. Bare nouns and the morphosyntax of number , 2002 .

[3] Mona Singh,et al. The Perfective Paradox: Or How to Eat Your Cake and Have it Too , 1991 .

[4] J. Lyons,et al. The Emergence of Basic Color Lexicons Hypothesis: a Comment on " the Vocabulary of Colour with Particular Reference to Ancient Greek , 1999 .

[5] Christopher D. Manning. Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[6] Francis Bond,et al. Using an Ontology to Determine English Countability , 2002, COLING.

[7] H. Hughes. The Cambridge Grammar of the English Language , 2003 .

[8] Cristina Schmitt,et al. Bare Nominals , Morphosyntax , and the Nominal Mapping Parameter , 2000 .

[9] Cristina Schmitt,et al. Against the Nominal Mapping Parameter: Bare nouns in Brazilian Portuguese , 1998 .

[10] Kentaro Ogura,et al. Classifiers in Japanese-to-English Machine Translation , 1996, COLING.

[11] Valerio Allegranza,et al. Determiners as Functors: NP Structure in Italian , 1991 .