Error-Tolerant Data Mining

Data mining seeks to discover novel and actionable knowledge hidden in the data. As dealing with large, noisy data is a defining characteristic for data mining, where the noise in a data source comes from, whether the noisy items are randomly generated (called random noise) or they comply with some types of generative models (called systematic noise), and how we use these data errors to boost the succeeding mining process and generate better results, are all important and challenging issues that existing data mining algorithms can not yet directly solve. Consequently, systematic research efforts in bridging the gap between the data errors and the available mining algorithms are needed to provide an accurate understanding of the underlying data and to produce enhanced mining results for imperfect, real-world information sources. This talk presents our recent investigations on bridging the data and knowledge gap in mining noisy information sources.