Editorial: Data Mining Lessons Learned

IntroductionData mining is concerned with finding interesting patterns in data.Many techniques have emerged for analyzing and visualizing large vol-umes of data. What one finds in the technical literature are mostlysuccess stories of these techniques. Researchers rarely report on stepsleading to success, failed attempts, or critical representation choicesmade; and rarely do papers include expert evaluations of achieved re-sults. An interesting point of investigation is also why some particularsolutions, despite good performance, were never used in practice orrequired additional treatment before they could be used. Insightfulanalyses of successful and unsuccessful applications are crucial for in-creasing our understanding of machine learning techniques and theirlimitations.The UCI Repository of Machine Learning Databases (Blake &Merz, 1998) hasserved the machine learning community for manyyearsas a valuable resource. It has benefited the community by allowingresearchers to compare algorithm performance on a common set ofbenchmark datasets, most taken from real-world domains. However,its existence has indirectly promoted a very narrow view of real-worlddata mining. Performance comparisons, which typically focus on clas-sification accuracy, neglect important data mining issues such as dataunderstanding, data preparation, selection of appropriate performancemetrics,andexpertevaluation ofresults.Furthermore,becausetheUCIrepository has been used so extensively, some researchers have claimedthat our algorithms may be “overfitting the UCI repository”.Challenge problems such as the KDD Cup, CoIL and PTE chal-lenges have also become popular in recent years and have attractednumerous participants. Contrary to the “UCI challenge” of achieving