An environment for mophosyntactic processing of unrestricted Spanish text

We present in this paper a fast, broad-coverage, accurate morphological analyzer for Spanish words, MACO+, which is an extended and improved version of that described in (Acebo et al., 1994). The earlier version had two main aws: it was not transportable, and it was too slow to enable massive text processing. The presented system not only overcomes those two aws, but also ooers improved coverage and accuracy. We also present two general part-of-speech taggers, which can be used to disambiguate the output of the morphological analyzer. All modules run in any Unix/Linux machine as a pipeline process and they may also be used inside the GATE environment for NLP (Cunningham et al., 1996). The system is currently being used to annotate the LexEsp corpus, a 5.5 million word corpus of Spanish, in a bootstrapping reening procedure. Initial evaluation and results are reported.

[1]  Horacio Rodríguez,et al.  Part-of-Speech Tagging Using Decision Trees , 1998, ECML.

[2]  David L. Waltz,et al.  Understanding Line drawings of Scenes with Shadows , 1975 .

[3]  Lluís Padró,et al.  A Flexible POS Tagger Using an Automatically Acquired Language Model , 1997, ACL.

[4]  Claire Cardie,et al.  Domain-specific knowledge acquisition for conceptual sentence analysis , 1995 .

[5]  Javier Larrosa,et al.  Constraint Satisfaction as Global Optimization , 1995, IJCAI.

[6]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[7]  Lluís Padró POS Tagging Using Relaxation Labelling , 1996, COLING.

[8]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[9]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[10]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[11]  David M. Magerman,et al.  Learning grammatical stucture using statistical decision-trees , 1996, ICGI.

[12]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[13]  Lluís Padró,et al.  A Hybrid Environment for Syntax-Semantic Tagging , 1998, ArXiv.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Lluís Padró,et al.  Developing a hybrid NP parser , 1997, ANLP.

[16]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[17]  Lluís Padró,et al.  MACO: morphological analyzer corpus-oriented , 1994 .

[18]  Marcello Pelillo,et al.  Learning Compatibility Coefficients for Relaxation Labeling Processes , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Horacio Rodríguez,et al.  Automatically acquiring a language model for POS tagging using decision trees , 2000 .