The Million Book Digital Library Project: Research Problems in Data Mining and Discovery

Data Mining is still as much it is an art as a science, and fancy new tools make it easy to do wrong things with one’s data even faster. We’ll examine the major “cracks in the crystal ball” through case studies, both simple and complex, of (often personal) errors—drawn from realworld consulting engagements. Best Practices for Data Mining will be (accidentally) illuminated by their (rarely described) opposites. These common errors range from allowing anachronistic variables into the pool of candidate inputs, to subtly inflating results through early up-sampling. You’ll hear cautionary tales of endangered projects and embarrassed teams—but also the keys to avoiding such a fate yourself.