Application of finite-state transducers to the acquisition of verb subcategorization information

This paper presents the design and implementation of a finite-state syntactic grammar of Basque that has been used with the objective of extracting information about verb subcategorization instances from newspaper texts. After a partial parser has built basic syntactic units such as noun phrases, prepositional phrases, and sentential complements, a finite-state parser performs syntactic disambiguation, determination of clause boundaries and filtering of the results, in order to obtain a verb occurrence together with its associated syntactic components, either complements or adjuncts. The set of occurrences for each verb is then filtered by statistical measures that distinguish arguments from adjuncts.

[1]  Khalil Sima'an,et al.  Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System , 1999, ArXiv.

[2]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[3]  Kepa Sarasola,et al.  A Bootstrapping Approach to Parser Development , 2000, IWPT.

[4]  Lauri Karttunen,et al.  The Proper Treatment of Optimality in Computational Phonology , 1998, ArXiv.

[5]  Izaskun Aldezabal Roteta,et al.  Learning argument/adjunct distinction for Basque , 2004 .

[6]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[7]  J. M. Arriola,et al.  Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages , 1998, ACL.

[8]  Kemal Oflazer Dependency Parsing with an Extended Finite State Approach , 1999, ACL.

[9]  Steven Abney,et al.  Part-of-Speech Tagging and Partial Parsing , 1997 .

[10]  Kepa Sarasola,et al.  Learning Argument/Adjunct Dictinction for Basque , 2002, ACL 2002.

[11]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[12]  Gregory Grefenstette,et al.  Regular expressions for language engineering , 1996, Natural Language Engineering.

[13]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[14]  Itziar Aduriz,et al.  Morphosyntactic Disambiguation For Basque Based On The Constraint Grammar Formalism , 1997 .

[15]  Gertjan van Noord,et al.  Approximation and Exactness in Finite State Optimality Theory , 2000, ACL 2000.

[16]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[17]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.

[18]  Kepa Sarasola,et al.  Automatic morphological analysis of Basque , 1996 .

[19]  Anoop Sarkar,et al.  Automatic Extraction of Subcategorization Frames for Czech , 2000, COLING.