Elements of Computational Statistics

to specialized software. For tree-based methods, the Chapter describes CART and C5.0, plus a variation known as patient rule induction methods or bump hunting. A common example is used. The situation with missing data and the computational effort necessary for each method are also considered. Chapter 10, “Boosting and Additive Trees,” describes boosting as “one of the most powerful learning ideas introduced in the last ten years” (p. 299). The book develops boosting methods for classiŽ ers and extends them to regression applications. The method called “Ada Boost” is described and used for additive models. Boosting is “a way of Ž tting an additive expansion in a set of elementary basis functions” (p. 310). Though this material is very advanced, the chapter discusses in a couple of sections the creating of “off-the-shelf” (p. 312) data-mining procedures using predictive learning methods. The authors note that “requirements of speed, interpretability and the messy nature of the data sharply limit the usefulness of most learning procedures as off-the-shelf methods for data mining” (p. 313). Decision trees are determined to be the best tool available. Boosting methods are recommended to improve accuracy via a multiple adaptive regression tree (MART). MART is illustrated for two large public-domain datasets. The statistician’s approach to neural networks (NNs) is the subject of Chapter 11. Projection pursuit regression (PPR) provides the starting point for the presentation. The single-layer perceptoron is the NN that has been selected for discussion. The authors note that NNs “are just nonlinear statistical models, much like the PPR” (p. 350). It was gratifying to see what easy work the authors could make of the entire process of conŽ guring an NN with the background of the book’s Ž rst 350 pages, although the authors still note that “there is quite an art in training neural networks” (p. 355). Several pages of guidance, along with two extensive examples, are provided. Chapter 12 offers generalizations of the use of linear decision boundaries for classiŽ cation. Techniques discussed include support vector machines (SVMs) and  exible discriminant analysis, the latter also including penalized discriminant analysis and mixture discriminant analysis. Applications of SVMs include regression analysis. This material is very complex, but some nice graphics and basic examples aid understanding. Readers are directed to S-PLUS programs. Chapter 13 continues along a similar vein with other methods for classiŽ cation and pattern recognition. These model-free methods are touted as “black box prediction engines” (p. 411). These include prototype methods, such as K-means clustering or k-nearest-neighbor classiŽ ers, and adaptive methods. A chapter on unsupervised learning concludes the book. Here the link to all of the supervised methods that precede this chapter is very enlightening. The chapter discusses various association rules, including market basket analysis, cluster analysis, self-organizing maps, principal components, independent component analysis (ICA), and multidimensional scaling. The 40 pages on cluster analysis include discussion of many algorithms, including combinatorics, K-means, vector quantization, K-medoids, hierarchical clustering, agglomerative clustering, and divisive clustering. Readers interested in ICA should investigate the recent book by Roberts and Everson (2001), reviewed by Rayens (2003). The Elements of Statistical Learning is a vast and complex book. Generally, it concentrates on explaining why and how the methods work, rather than how to use them. Examples and especially the visualizations are principal features, but little guidance is available to reader who would want to reproduce these results. As a resource for the methods of statistical learning, however, it will probably be a long time before there is a competitor to this book.