EVIDIST: A Similarity Measure for Uncertain Data Streams

Large amount of data generated by sensors, and increased use of privacy-preserving techniques have led to an increasing interest in mining uncertain data streams. Traditional distance measures such as the Euclidean distance do not always work well for uncertain data streams. In this paper, we present EVIDIST, a new distance measure for uncertain data streams, where uncertainty is modeled as sample observations at each time slot. We conduct an extensive experimental evaluation of EVIDIST (Evidential Distance) on the 1-NN classification task with 15 real datasets. The results show that, compared with Euclidean distance, EVIDIST increases the classification accuracy by about 13 % and is also far more resilient to error.