Introduction to Finite-State Devices in Natural Language Processing

The theory of nite-state automata (FSA) is rich and nite-state automata techniques have been used in a wide range of domains, such as switching theory, pattern matching, pattern recognition, speech processing, hand writing recognition, optical character recognition, encryption algorithm, data compression, indexing and operating system analysis (Petri-net). Finite-State devices such as Finite-State Automata, Graphs and FiniteState Transducers have been known since the emergence of Computer Science and are extensively used in areas as various as program compilation, hardware modeling or database management. In Computational Linguistics, although they were known for a long time, more powerful formalisms such as contextfree grammars or uni cation grammars have been preferred. However, recent mathematical and algorithmic results in the eld of nite-state technology have had a great impact on the representation of electronic dictionaries and natural language processing. As a result, a new language technology is emerging out of both industrial and academic research. This book presents fundamental nite-state algorithms and approaches from the perspective of natural language processing. In this chapter, we describe the basic notions of nite-state automata and nite-state transducers. We also describe the fundamental properties of these machines while illustrating their use. We give simple formal language examples as well as natural language examples. We also illustrate some of the main algorithms used with nite-state automata and transducers. MERL-TR-96-13 June 1996 This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonpro t educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories of Cambridge, Massachusetts; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, 1996 201 Broadway, Cambridge, Massachusetts 02139 This introduction is to appear in \Finite-State Devices for Natural Language Processing". Roche and Schabes (Editors). MIT Press.