A Handbook of Statistical Analyses Using Stata
暂无分享,去创建一个
This book had an interesting origin: Previously the authors wrote Berry and Linoff (1997), reported for Technometrics by Ziegel (2001). As they note in the introduction (p. xvii) to this new book, a lot has changed with the technology in that period of time. In addition, they note, “We want to address the needs of practitioners on both the business and the technical sides” (ibid.). The book arrived as I was departing for an international conference and a site visit concerning data mining projects, so I read most of it on the transAtlantic crossing. It is an easy read. Technical details are not eliminated, but everything is done to make this book totally accessible to anyone in the organization who might need background or have any kind of participation in a data mining project. The book has three parts. The first part gives all the background on the subject. The first of the four chapters here gives both the business and the technical basis for data mining. The second chapter presents the premise that organizations should treat data mining as a core competency. It should be noted that this perspective is set forth by two authors who are part of a consulting business. The third chapter gives an overview of the data-mining process. The fourth chapter is about customers. The subtitle for the book is The Art and Science of Customer Relationship Management, a perspective that influenced the selection of illustrations and case studies but otherwise did not detract from the value of the book for applications that do not involve customers. The second part has four chapters that provide a readable overview of data-mining methodology. The first of these chapters discusses three major techniques—k-means clustering, decision trees, and neural networks. Each topic is handled by discussing the most common approach for using the technique. The next chapter considers the collecting, organizing, and managing of data. There follows a chapter on building predictive models. The two interesting topics in this chapter are the division of the data into test, training, and validation sets and the developing of multiple models for one application. The last chapter in this part has four case studies concerned with the setting up of data mining environments. The last part of the book has seven case studies. All but one of these are devoted to customer-relationship applications. There is a chapter on improving manufacturing processes that discusses data mining projects at R. R. Donnelley and Time-Warner. Both applications involve the improving of problems in printing plants. These are moderately useful in helping the novice understand how to be a useful participant in a data mining project. This book deals in passing with the relationship of both statistics and statisticians to the data mining process. It certainly promotes a role for the statistically knowledgeable participant in the data mining effort. This is actually a good background book for statisticians, despite the lack of any references of any kind. Statisticians will learn the process and find direct and indirect insights into their possible roles in data mining efforts.