IntroductionData mining is concerned with finding interesting patterns in data.Many techniques have emerged for analyzing and visualizing large vol-umes of data. What one finds in the technical literature are mostlysuccess stories of these techniques. Researchers rarely report on stepsleading to success, failed attempts, or critical representation choicesmade; and rarely do papers include expert evaluations of achieved re-sults. An interesting point of investigation is also why some particularsolutions, despite good performance, were never used in practice orrequired additional treatment before they could be used. Insightfulanalyses of successful and unsuccessful applications are crucial for in-creasing our understanding of machine learning techniques and theirlimitations.The UCI Repository of Machine Learning Databases (Blake &Merz, 1998) hasserved the machine learning community for manyyearsas a valuable resource. It has benefited the community by allowingresearchers to compare algorithm performance on a common set ofbenchmark datasets, most taken from real-world domains. However,its existence has indirectly promoted a very narrow view of real-worlddata mining. Performance comparisons, which typically focus on clas-sification accuracy, neglect important data mining issues such as dataunderstanding, data preparation, selection of appropriate performancemetrics,andexpertevaluation ofresults.Furthermore,becausetheUCIrepository has been used so extensively, some researchers have claimedthat our algorithms may be “overfitting the UCI repository”.Challenge problems such as the KDD Cup, CoIL and PTE chal-lenges have also become popular in recent years and have attractednumerous participants. Contrary to the “UCI challenge” of achieving
[1]
Rajesh Parekh,et al.
Lessons and Challenges from Mining Retail E-Commerce Data
,
2004,
Machine Learning.
[2]
Peter A. Flach,et al.
Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned
,
2004,
Machine Learning.
[3]
P. van der Putten,et al.
A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000
,
2004
.
[4]
Maarten van Someren,et al.
A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000
,
2004,
Machine Learning.
[5]
Catherine Blake,et al.
UCI Repository of machine learning databases
,
1998
.
[6]
Charles Elkan,et al.
Magical thinking in data mining: lessons from CoIL challenge 2000
,
2001,
KDD '01.
[7]
Nada Lavrac,et al.
Introduction: Lessons Learned from Data Mining Applications and Collaborative Problem Solving
,
2004,
Machine Learning.
[8]
Tobias Scheffer,et al.
Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics
,
2004,
Machine Learning.
[9]
Tom M. Mitchell,et al.
Learning to Decode Cognitive States from Brain Images
,
2004,
Machine Learning.
[10]
Chun-Nan Hsu,et al.
Mining Skewed and Sparse Transaction Data for Personalized Shopping Recommendation
,
2004,
Machine Learning.