Data mining in the real world: What do we need and what do we have?

Historically, data mining has been in the hands of mall teams of expert statisticians who produce a few models per year. However, recently companies i vested heavily in building huge data warehouses (from a few terabytes to peta-bytes) tha t contain millions of records and thousands of variables; for example, 5,000 variables on 150 mill ion customers and prospects. That has changed the economics of data mining. Now business es want a return on that investment and are looking well beyond reporting and basic statistics [1]. When they review their business activities, they see the need for 100s or 1000s of predictive m odels per year. Of course, very few companies can produce that many today, due to a lac k of expert staff and appropriate tools. Some actually do generate that many models and we will p rovide examples, including:

[1]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[2]  T. Davenport Competing on analytics. , 2006, Harvard business review.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.