论文信息 - Incremental discretization, application to data with concept drift

Incremental discretization, application to data with concept drift

In this paper we present a method for incremental discretization able to be adapted to gradual changes in the target concept. The proposed method is based on the Partition incremental Discretization (PiD for short). The algorithm divides the discretization task in two layers. The first layer receives the sequence of input data and retains some statistics of the data using more intervals than required. The second layer computes the final discretization, based in the statistics stored by the first layer. The method is able to process streaming examples in a single scan, in constant time and space even for infinite sequences of examples. In dynamic environments the target concept can gradually change over time. Past examples may not reflect the actual status of the problem. To accommodate concept drift we use an exponential decay that smoothly reduces the importance of older examples. Experimental evaluation on a benchmark problem for drift environments, clearly illustrates the benefits of the weighting examples technique.

João Gama | Carlos Pinto

[1] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[2] William Nick Street,et al. A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[3] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[4] Tapio Elomaa,et al. Speeding Up the Search for Optimal Partitions , 1999, PKDD.

[5] Gregory Piatetsky-Shapiro,et al. Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[6] Ralf Klinkenberg,et al. Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[7] Sudipto Guha,et al. Data-streams and histograms , 2001, STOC '01.

[8] KlinkenbergRalf. Learning drifting concepts: Example selection vs. example weighting , 2004 .