Applied Chemometrics for Scientists
暂无分享,去创建一个
“Which test should I apply?,” are the first words to appear in the Preface of this book. The authors have taken on the task of teaching ecologists to learn to think (more) statistically. The book is aimed at three types of readers: ecologists who wish to develop their own statistical skills, quantitative ecologists who want to use more advanced techniques, and statistical scientists seeking more experience analyzing ecological data. A distinctive feature of the book is the 18 richly detailed case studies, the product of an additional 41 contributors. The methods chapters (Chaps. 4–19) refer extensively to the case studies (Chaps. 20–37). The 16 methods chapters, under the heading “Applied Statistical Theory,” may be grouped as data exploration and modeling, multivariate methods, time series and trends, and methods for lattice and spatially continuous data. Following Chapter 4 on data exploration, Chapters 5–9 cover regression methods, including generalized linear models (GLMs), additive and generalized additive models (GAMs), generalized least squares (GLS) and mixed modeling, and classification and regression tree models. Chapters 10–15 on multivariate methods include principal component/redundancy analysis, correspondence analysis, discriminant analysis, principal coordinate analysis, and nonmetric multidimensional scaling. Time series and trend analysis are covered in Chapters 16 and 17, lattice data analysis is covered in Chapter 18 (by two contributors), and analysis of spatially continuous data is covered in Chapter 19. The 18 case study chapters cover various topics in fishery and marine sciences, forestry, and wildlife (e.g., birds, grasslands, honeybees, salt marshes, turtles). Data for the case studies are available (for the most part) on the book’s website. Output from software comes primarily from Brodgar (www.brodgar. com, written by one of the authors) and R (www.r-project.org). For multivariate techniques, programs used include Genstat, CANOCO, PRIMER, and PC– ORD. SAS was not considered, because none of the authors had worked with it. Table 2.5 lists the various statistical modeling methods, multivariate techniques, and spatial statistics methods, providing an indication of what is available in Brodgar, Genstat, CANOCO, PRIMER, R, and PC–ORD. The book’s website has links to data sets for the case study chapters. As of January 2008, data sets were not yet available for Chapters 23–26 (case studies involving mixed modeling, classification trees, neural network analysis, GLS, and nonmetric multidimensional scaling), and the link for the data for Chapter 29 (harbor porpoise multivariate data) was not functioning. Because different types of software have been used to produce the output presented in the book, no code is printed in the book itself. The intent is to have R code for Chapters 4–37 available on the book’s website. As of January 2008, only Chapters 19 (spatially continuous data) and 37 (spatial modeling of forest community data) had R code. Until the rest of the R code is available, a scientist lacking a guide to how to obtain these graphics could be frustrated. In their defense, the authors point out the difficulty of trying to teach statistical methods using only R to students who lack experience in R and prefer to use software that is completely menu-based. Those who wish to learn such topics as linear models, GLM, GAM, and GLS using R might have an easier time with the books of Faraway (2005, 2006), which have R-code directly in the text. If when using this book as a course textbook, the optimum students are those with some experience with linear regression. Chapter 5 covers simple, multiple, and partial linear regression, all in 30 pages; the authors describe the chapter as a “brief refresher” (p. 49). In keeping with the authors’ stated objectives, there are few formulas. Chapter 6 devotes 18 pages to GLM, concentrating on Poisson regression and logistic regression. Some GLM references are given; the authors note which ones are for the more mathematically inclined reader. There is no mention (apart from quasi-Poisson) of the usual gamut of other probability distributions and link functions. I suspect that rather than try various combinations of link functions and probability distributions to fit a set of data, the authors would rather encourage increased use of additive modeling on ecological data. Chapters 7 (“Additive and Generalised Additive Modeling”) and 8 (“Introduction to Mixed Modeling”) are places where this book really shines. The explanations are nontechnical, with more of a “how-to” approach, including interpretation of R output, a nonmathematical explanation of the LOESS algorithm, and how to move the analysis forward. Chapter 7 uses data on species richness and grain size from the case study on a Dutch sandy beach community. Chapter 8 refers the reader to five case studies (honeybee data, aquatic birds, grasslands, salt marshes, and a forest community) that require the use of mixed modeling. This topic can be difficult to include in a single-term course, yet many biologists and ecologists need to know it and use it. Each chapter contains a useful section on model validation and selection. In the section on multivariate methods, the concept of ordination is introduced in a short chapter on Bray–Curtis ordination, a rarely used but useful method for conveying the general idea. The reader is then ready to tackle subsequent chapters on principal component analysis, correspondence analysis, and discriminant analysis. For some topics, rather than reproduce material from other texts, the authors recommend that the user consult a particular reference. For example, Chapter 10, on multivariate measures of association, refers heavily to chapter 7 of Legendre and Legendre (1998). This particular reference is also relied on for material in Chapters 12–15. Similarly, Chapter 16 (“Time Series Analysis: Introduction”) often references the book of Makridakis, Wheelwright, and Hyndman (1998). About half of the case study chapters are associated with doctoral dissertation projects. The introduction to each case study details which techniques are used and what one may expect to learn in terms of analysis and interpretation. The reader gets a good feel for the actual development of the analysis process, including wrong turns in some cases. For example, in Chapter 23 (“Investigating the Effects of Rice Farming on Aquatic Birds With Mixed Modeling”), the lead author states precisely how using mixed modeling has improved the original analyses appearing in two earlier works and describes what was lacking in the original analyses. Enjoyable aspects of the book include good graphical outputs, with interpretations, in the text. There are reminders referring to previous chapters when a term or concept is encountered for a second time (e.g., AIC, p. 111). Chapter 3 (“Advice for Teachers”) contains decision analysis flowcharts that, when used in conjunction with a case study, can reinforce the analysis process for the student. It would have been useful to have had a list of the data sets in the Index. It can be frustrating when a data set not associated with a case study (e.g., squid data) is described in detail and used to illustrate points about analysis, but no indication is provided as to how to obtain the data. Thus, it is not possible to reproduce the graphics and other R output in the text for some data sets. Overall, this book is worth the purchase price based on the rich case studies alone. No other book combines as many good ecological data sets with such thoughtfully written analyses. I give this book two enthusiastic thumbs up! Data sets and R code are available at www.highstat.com/books.htm.