论文信息 - Automatic Extraction of Polish Verb Subcategorization An Evaluation of Common Statistics

Automatic Extraction of Polish Verb Subcategorization An Evaluation of Common Statistics

This article compares and evaluates common statistics used in the process of filtering the hypotheses within the task of automatic valence extraction. A broader range of statistics is compared than the ones usually found in the literature, including Binomial Miscue Probability, Likelihood Ratio, t Test, and various simpler statistics. All experiments are performed on the basis of morphosyntactically annotated but very noisy Polish data. Despite a different experimental methodology, the results confirm Korhonen’s findings that statistics based solely on the number of occurrences of a given verb and the number of cooccurrences of the verb and a given frame in general fare much better than statistics comparing such conditional frame frequency with the unconditional frame frequency.

Adam Przepiórkowski

[1] Adam Przepiórkowski,et al. The Unberable Lightness of Tagging* A Case Study in Morphosyntactic Tagging of Polish , 2003, LINC@EACL.

[2] Anna Korhonen,et al. Statistical Filtering and Subcategorization Frame Acquisition , 2000, EMNLP.

[3] Michael R. Brent,et al. From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax , 1993, Comput. Linguistics.

[4] Manolis Maragoudakis,et al. LEARNING SUBCATEGORIZATION FRAMES FROM CORPORA : A CASE STUDY FOR MODERN GREEK , 2000 .

[5] Ralph Grishman,et al. Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[6] Anoop Sarkar,et al. Learning Verb Argument Structure from Minimally Annotated Corpora , 2002, COLING.

[7] Anoop Sarkar,et al. Automatic Extraction of Subcategorization Frames for Czech , 2000, COLING.

[8] Lukasz Debowski. Trigram morphosyntactic tagger for Polish , 2004, Intelligent Information Systems.

[9] Maria Lapata,et al. Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations , 1999, ACL.

[10] Anoop Sarkar,et al. Learning Verb Subcategorization from Corpora: Counting Frame Subsets , 2000, LREC.

[11] Ted Briscoe,et al. Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.