Verb SCF extraction for Spanish with dependency parsing

In this paper we present the results of our experiments in automatic production of verb subcategorization frame lexica for Spanish. The work was carried out in the framework of a project aiming at the automatic acquisition of lexical information reducing at maximum human intervention. In our experiments, a chain of different tools was used: domain focused web crawling, automatic cleaning, segmentation and tokenization, PoS tagging, dependency parsing and finally SCFs extraction. The obtained results show a high dependency on the quality of the results of the intervening components, in particular of the dependency parsing, which is the focus of this paper. Nevertheless, the results achieved are in line with the state-of-the-art for other languages in similar experiments.

[1]  Chris Brew,et al.  Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information , 2002, ACL.

[2]  Grzegorz Chrupala,et al.  Acquiring Verb Subcategorization from Spanish Corpora , 2003 .

[3]  Montserrat Marimon,et al.  The IULA Treebank , 2012, LREC.

[4]  Antonio Toral,et al.  Mining and Exploiting Domain-Specific Corpora in the PANACEA Platform , 2013, ArXiv.

[5]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[6]  Laura Alonso Alemany,et al.  The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level , 2006, LREC.

[7]  李幼升,et al.  Ph , 1989 .

[8]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[9]  Ding Yuan,et al.  Natural language generation in the context of machine translation , 2002 .

[10]  Serena Villata,et al.  Automatic extraction of subcategorization frames for Italian , 2008, LREC.

[11]  Cédric Messiant,et al.  A Subcategorization Acquisition System for French Verbs , 2008, ACL.

[12]  Andy Way,et al.  Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank , 2004, ACL.

[13]  Daisuke Kawahara,et al.  Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation , 2010, LREC.

[14]  Ted Pedersen,et al.  Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, Los Angeles, CA, USA, June 6, 2010 , 2010, NAACL.

[15]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[16]  Patrick Saint-Dizier,et al.  The VOLEM project: a framework for the construction of advanced multilingual lexicons , 2002, Language Engineering Conference, 2002. Proceedings.

[17]  Jonas Kuhn,et al.  The Best of Both Worlds – A Graph-based Completion Model for Transition-based Parsers , 2012, EACL.

[18]  Yuval Krymolowski,et al.  Verb Class Discovery from Rich Syntactic Data , 2008, CICLing.

[19]  Yuval Krymolowski,et al.  Automatic Classification of English Verbs Using Rich Syntactic Features , 2008, IJCNLP.

[20]  Diana McCarthy,et al.  Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences , 2003, CL.

[21]  Paula Chesley,et al.  Automatic extraction of subcategorization frames for French , 2006, LREC.

[22]  Laura Alonso Alemany,et al.  IRASubcat, a highly parametrizable, language independent tool for the acquisition of verbal subcategorization information from corpus , 2010, NAACL.

[23]  Eva Esteve Ferrer Towards a Semantic Classification of Spanish Verbs Based on Subcategorisation Information , 2004, ACL.

[24]  Yuval Krymolowski,et al.  On the Robustness of Entropy-Based Similarity Measures in Evaluation of Subcategorization Acquisition Systems , 2002, CoNLL.

[25]  Ted Briscoe,et al.  A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora , 2007, ACL.

[26]  Joakim Nivre,et al.  MaltOptimizer: A System for MaltParser Optimization , 2012, LREC.

[27]  Vito Pirrelli,et al.  Unsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora , 2008, LREC.

[28]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[29]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[30]  Frank Keller,et al.  Verb Frame Frequency as a Predictor of Verb Bias , 2001, Journal of psycholinguistic research.

[31]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[32]  Muntsa Padró,et al.  Finding Dependency Parsing Limits over a Large Spanish Corpus , 2013, IJCNLP.

[33]  Anna Korhonen Automatic lexical classification: bridging research and practice , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.