A Finite-State Morphological Processor For Spanish

A finite transducer that processes Spanish inflectional and derivational morphology is presented. The system handles both generation and analysis of tens of millions inflected forms. Lexical and surface (orthographic) representations of the words are linked by a program that interprets a finite directed graph whose arcs are labelled by n-tuples of strings. Each of about 55,000 base forms requires at least one are in the graph. Representing the inflectional and derivational possibilities for these forms imposed an overhead of only about 3000 additional arcs, of which about 2500 represent (phonologically predictable) stem allomorphy, so that we pay a storage price of about 5% for compiling these forms offline. A simple interpreter for the resulting automaton processes several hundred words per second on a Sun4.