Book Review: Memory-Based Language Processing, by Walter Daelemans and Antal van den Bosch

ion is introduced. Since MBLP does not abstract over the training data, it is called a lazy learning approach. Rule induction, in contrast, learns rules and does not go back to the actual training data during classification. ∗ A shorter version of this review will be published in German in the journal Linguistische Berichte. Computational Linguistics Volume 32, Number 4 The book consists of 7 chapters. Chapter 1 situates memory-based language processing firmly in the domain of empirical approaches to NLP. Empirical approaches became attractive in the early 1990s, replacing knowledge-based approaches to a high degree. Daelemans and van den Bosch argue that in the range of empirical approaches, memory-based learning offers the advantage over statistical approaches that it does not abstract over low-frequency events. Such low-frequency events are necessary in processing natural language problems because they often describe exceptions or subregularities. The chapter also introduces the major concepts of MBLP and provides an intuitive example from linguistics: PP attachment. Chapter 2 locates central concepts of MBLP in neighboring areas of research: In linguistics, the idea of processing by analogy to previous experience is a well-known concept. Psycholinguistics often uses exemplar-based approaches or, more recently, hybrid approaches that combine rules with exceptions. Applications of memory-based principles can be found in explanation-based machine translation (Nagao 1984) and data-oriented parsing (Bod 1998). Chapter 3 gives a simultaneous introduction to memory-based learning and TiMBL, the Tilburg implementation of the method. This strategy of combining theory and practice gives the reader an impression of the importance of selecting optimal parameter settings for different problems. The application of TiMBL is demonstrated on the example of plural formation in German. The chapter ends with the introduction of evaluation methodology and TiMBL’s built-in evaluation functions. Chapter 4 describes the application of TiMBL to two more complex linguistic examples: grapheme to phoneme conversion and morphological analysis. In order to find optimal solutions for these problems, two algorithms that deviate from the standard memory-based learning algorithm are introduced: IGTREE and TRIBL. IGTREE is a decision tree approximation, which bases the comparison of an example to others on a small number of feature comparisons. TRIBL is a hybrid model between the standard memory-based learning algorithm, IB1, and IGTREE. Both modifications reduce memory requirements and processing time during classification, but they may also affect classification accuracy. Unfortunately, the presentation of the first example suffers from unreadable phonetic transcriptions throughout the chapter. Whereas Chapter 4 analyzes linguistic problems, which are easily described in terms of classification, chapter 5 approaches a problem of sequence learning: partial parsing. For this task, phrase and clause boundaries must be found. In order to apply classification methods to sequence learning, the problemmust be redefined as assigning tags to words or word combinations, so-called IOB tagging (Ramshaw and Marcus 1995). This tagging provides information as to whether a word constitutes a boundary or not. One advantage of using MBLP for such problems lies in the fact that different types of information, including long-distance information, can be included without modification of the original algorithm. In Chapter 6, Daelemans and van den Bosch investigate the difference between lazy and eager learning. As noted earlier, TiMBL is a typical example of lazy learning since it does not abstract from the training data. RIPPER (Cohen 1995), the other classifier used in this chapter, is a typical eager learning approach: It is a rule-induction algorithm, which displays the opposite behavior to TiMBL: a complex learning strategy and simple, efficient classification. The results presented in this chapter show that deleting examples from the training data is harmful for classification, supporting the hypothesis that lazy learning has a fitting bias for natural language problems. However, this seems to be a little too straightforward. Here, one would expect a reference to the findings of Daelemans and Hoste (2002), which show that parameter and feature