Book Reviews: Introduction to Multivariate Statistical Analysis in Chemometrics

Introduction to Multivariate Statistical Analysis in Chemometrics. Kurt Varmuza and Peter Filzmoser. CRC Press, Boca Raton, FL, 2009. Pp: 321. Price: USD$119.95. ISBN 13 978-14200-5947-2. The new book by Kurt Varmuza, a chemometrician, and Peter Filzmoser, a statistician, takes the reader on a brisk walk though modern data analysis. The collaboration here gives a somewhat unconventional, integrated perspective on the treatment of multivariate data. Like the book’s cover—a panorama of the Monument Valley area in the American Southwest—the book provides a wide-ranging overview of many ‘‘monuments’’ of chemometrics and modern multivariate statistics, without a tight focus on any one. The tone is introductory throughout. After a first chapter with extensive, up-to-date lists of other texts in chemometrics and multivariate statistics, the authors begin with introductory examples from near-infrared spectroscopy and archaeological glasses, followed by a brief examination of univariate statistical methods. In most cases, only a short discussion is provided, along with a brief mention of the R language object or an R command that performs the test. The use of R, while now common in statistical texts, is less frequent in chemometrics. Moving from this univariate approach to data pretreatment, the authors take on multivariate data pretreatment and outlier detection. Unlike many introductory texts in chemometrics, they cover data transformations and robust methodology here as well as the more conventional centering and scaling. As with univariate methods, the authors provide a brief mention of the R object for performing the task. This chapter shows their focus on graphical depiction of the methods, another attribute of the book that sets it apart from others available. The usual topics that are common to most texts in chemometrics follow: principal components analysis, calibration, classification, and cluster analysis. What is unusual is the inclusion of robust methods throughout, the extensive use of graphical representations of the methods to aid in understanding, and the software examples implementing the methods in R. Readers comfortable with R can follow along by obtaining the libraries and datasets for the examples from the Comprehensive R Archive Network (CRAN, at http://cran.r-project.org), as I did. They are all free, as is the R software to run them. While many of the topics covered in calibration and classification are standard, some of the topics considered are unconventional, possibly reflecting field-based differences in the authors’ perspectives. For example, in calibration, ridge regression gets more than the usual attention, along with the lasso method, as well as ordinary (classical) least squares (OLS) and canonical correlation. Given the recent resurgence of interest in OLS methods by spectroscopists, detailed discussion of OLS is both timely and highly useful. The treatment of the more conventional partial least squares (PLS) and allied methods is both substantial and upto-date: even the recent controversy over orthogonal-scores-based and orthogonal-loadings-based calculation of latent variables in PLS is covered here. When needed, the authors delve into some of the mathematical details, but most of the treatment is not heavily oriented to theory, a feature that those just starting out will surely appreciate. As noted above, classification is considered from a slightly different perspective. The linear discriminant is discussed in detail, but Gaussian mixture models, decision trees, support vector machines, and logistic regression share space with the more commonly presented k-NN and SIMCA classifiers that are routine in commercial chemometric software packages. Similarly, the chapter on clustering covers the usual hierarchical methods present in the commercial packages but also briefly examines fuzzy clustering. A chapter on data preprocessing ends the book. This chapter provides very brief coverage of basic methods such as differentiation, multiplicative scatter correction, and mass spectral normalization methods. It is somewhat less up to date than the other chapters. The authors mention but do not discuss wavelet processing, for example. With all of these methods receiving attention in the span of about 300 pages, readers should not expect comprehensive coverage of all of the popular topics. Some aspects of modern chemometrics don’t make the list: multivariate curve resolution is not covered here, nor are other recent preprocessing methods that are implemented in some software packages, such as orthogonal signal correction. Higher-order methods receive only the briefest of treatment. The Appendices covering matrix algebra and the R language are also quite short; those needing help will likely find these too brief, but there is a good amount of web-based help available on these topics. The spectroscopist will find this a useful reference to modern chemometrics. The integration of chemometric methods with modern multivariate statistical methods should prove particularly useful. The motivated student seeking to gain experience in data analysis will find this a balanced, insightful, and accessible introduction to the field. Given the use of R software to implement the methods discussed in the text, and the high overlap of multivariate statistics and chemometrics in recent literature, this is a text that offers a good deal. It offers even more to the reader who invests a bit of time learning enough R to put the accompanying software library to use. There is a great deal of useful chemometrics available here for the price of a textbook. I highly recommend the book.