Statistical data mining using SAS applications

The book can be viewed as a specialised tool for SAS data analysis. Divided into seven main sections, it addresses a wide range of analytical topics from an introduction to data mining to core unsupervised and supervised learning techniques. Its key features include the provision of case studies throughout the sections, downloadable macros and instructions on how to run them. A working knowledge of SAS is expected but there is no requirement for either mathematical or SAS programming maturity. The step-by-step instructions and the graphical representations of data make it particularly useful to those wishing to communicate complex and technical data to a largely non-specialist audiences. As a regular SAS user, over the years I have noted the general feeling of confusion between the conventional SAS application and SAS Enterprise Guide (SAS EG), especially among some members of the non-SAS community. In Appendix II, the author highlights the incompatibility of some of the book’s accompanying macros and refers the reader to SAS EG compatible macros. The macro files, macro-call files and sample data sets to be used in the examples must all be downloaded from the book’s website. Although this is a good feature in that future macro updates may be uploaded to the site, including these files on a CD attached to the book would probably have greatly enhanced its scope. Although the book may be viewed as software-specific, with the potential risk of sliding into obsolescence as new SAS versions come, its examples help to develop a software-independent data mining understanding. While as a regular SAS user I could easily read and follow the examples, it is very likely that those new to the SAS and SAS EG environments may find navigation through the book somewhat awkward. Furthermore, the lack of explicit statistical computing examples and the exclusion of procedural routines amount to carrying out a background demonstration of data mining without showing the reader how to do it. This adopted style seems to have obscured the relevance of Chapter 7. It is probably fair to say that, in its current form, the book can only be useful as a reference to people who routinely use SAS applications or as a supplement to a statistics or data mining course with a significant SAS component. Future editions may need to enhance some of the faintly visible graphics such as the screenshots on pages 235 and 238 and typos such as the repeated “p-value value” on page 160.