The Elements of Statistical Learning: Data Mining, Inference, and Prediction

In the words of the authors, the goal of this book was to “bring together many of the important new ideas in learning, and explain them in a statistical framework.” The authors have been quite successful in achieving this objective, and their work is a welcome addition to the statistics and learning literatures. Statistics has always been interdisciplinary, borrowing ideas from diverse Ž elds and repaying the debt with contributions, both theoretical and practical, to the other intellectual disciplines. For statistical learning, this cross-fertilization is especially noticeable. This book is a valuable resource, both for the statistician needing an introduction to machine learning and related Ž elds and for the computer scientist wishing to learn more about statistics. Statisticians will especially appreciate that it is written in their own language. The level of the book is roughly that of a second-year doctoral student in statistics, and it will be useful as a textbook for such students. In a stimulating article, Breiman (2001) argued that statistics has been focused too much on a “data modeling culture,” where the model is paramount. Breiman argued instead for an “algorithmic modeling culture,” with emphasis on black-box types of prediction. Breiman’s article is controversial, and in his discussion, Efron objects that “prediction is certainly an interesting subject, but Leo’s paper overstates both its role and our profession’s lack of interest in it.” Although I mostly agree with Efron, I worry that the courses offered by most statistics departments include little, if any, treatment of statistical learning and prediction. (Stanford, where Efron and the authors of this book teach, is an exception.) Graduate students in statistics certainly need to know more than they do now about prediction, machine learning, statistical learning, and data mining (not disjoint subjects). I hope that graduate courses covering the topics of this book will become more common in statistics curricula. Most of the book is focused on supervised learning, where one has inputs and outputs from some system and wishes to predict unknown outputs corresponding to known inputs. The methods discussed for supervised learning include linear and logistic regression; basis expansion, such as splines and wavelets; kernel techniques, such as local regression, local likelihood, and radial basis functions; neural networks; additive models; decision trees based on recursive partitioning, such as CART; and support vector machines. There is a Ž nal chapter on unsupervised learning, including association rules, cluster analysis, self-organizing maps, principal components and curves, and independent component analysis. Many statisticians will be unfamiliar with at least some of these algorithms. Association rules are popular for mining commercial data in what is called “market basket analysis.” The aim is to discover types of products often purchased together. Such knowledge can be used to develop marketing strategies, such as store or catalog layouts. Self-organizing maps (SOMs) involve essentially constrained k-means clustering, where prototypes are mapped to a two-dimensional curved coordinate system. Independent components analysis is similar to principal components analysis and factor analysis, but it uses higher-order moments to achieve independence, not merely zero correlation between components. A strength of the book is the attempt to organize a plethora of methods into a coherent whole. The relationships among the methods are emphasized. I know of no other book that covers so much ground. Of course, with such broad coverage, it is not possible to cover any single topic in great depth, so this book will encourage further reading. Fortunately, each chapter includes bibliographic notes surveying the recent literature. These notes and the extensive references provide a good introduction to the learning literature, including much outside of statistics. The book might be more suitable as a textbook if less material were covered in greater depth; however, such a change would compromise the book’s usefulness as a reference, and so I am happier with the book as it was written.