Wavelet Transform in Similarity Paradigm

textabstract[INS-R9802] Searching for similarity in time series finds still broader applications in data mining. However, due to the very broad spectrum of data involved, there is no possibility of defining one single notion of similarity suitable to serve all applications. We present a powerful framework based on wavelet decomposition, which allows designing and implementing a variety of criteria for the evaluation of similarity between time series. As an example, two main classes of similarity measures are considered. One is the global, statistical similarity which uses the wavelet transform derived Hurst exponent to classify time series according to their global scaling properties. The second measure estimates similarity locally using the scale-position bifurcation representation derived from the wavelet transform modulus maxima representation of the time series. A variety of generic or custom designed matching criteria can be incorporated into the detail similarity measure. We demonstrate the ability of the technique to deal with the presence of scaling, translation and polynomial bias and we also test sensitivity to the addition of random noise. Other criteria can be designed and this flexibility can be built into the data mining system to allow for specific user requirements.#[INS-R9815] For the majority of data mining applications, there are no models of data which would facilitate the tasks of comparing records of time series, thus leaving one with `noise' as the only description. We propose a generic approach to comparing noise time series using the largest deviations from consistent statistical behaviour. For this purpose we use a powerful framework based on wavelet decomposition, which allows filtering polynomial bias, while capturing the essential singular behaviour. In particular we are able to reveal scale-wise ranking of singular events including their scale-free characteristic: the Holder exponent. We use such characteristics to design a compact representation of the time series suitable for direct comparison, e.g. evaluation of the correlation product. We demonstrate that the distance between such representations closely corresponds to the subjective feeling of similarity between the time series. In order to test the validity of subjective criteria, we test the records of currency exchanges, finding convincing levels of (local) correlation.