Learning in the Presence of Large Fluctuations: A Study of Aggregation and Correlation

Consider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when predicting the potential future damages of earthquakes or oil spills, or when conducting financial data analysis. If follows that, in such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentially suppresses large fluctuations. In this paper, we present an analysis of data aggregation and correlation in such scenarios. To this end, we introduce the Levy, or stable, distribution which is a generalization of the Gaussian distribution. Our theoretical conclusions are illustrated with various simulations, as well as against a benchmarking financial database. We show which specific strategies should be adopted for aggregation, depending on the stability exponent of the Levy distribution. Our results indicate that the correlation in between two attributes may be underestimated if a Gaussian distribution is erroneously assumed. Secondly, we show that, in the scenario where we aim to learn a set of rules to estimate the level of stability of a stock market, the Levy distribution produces superior results. Thirdly, we illustrate that, in a multi-relational database mining setting, aggregation using average values may be highly unsuitable.

[1]  Peter A. Flach,et al.  Comparative Evaluation of Approaches to Propositionalization , 2003, ILP.

[2]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[3]  J. McCulloch,et al.  Estimation of stable spectral measures , 2001 .

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  Svetlozar T. Rachev,et al.  MULTIVARIATE STABLE FUTURES PRICES , 1995 .

[6]  Herna L. Viktor,et al.  Privacy Disclosure and Preservation in Learning with Multi-Relational Databases , 2011, J. Comput. Sci. Eng..

[7]  Robert D. Groot Levy distribution and long correlation times in supermarket sales , 2005 .

[8]  Arno J. Knobbe,et al.  Involving Aggregate Functions in Multi-relational Search , 2002, PKDD.

[9]  Horst Rinne,et al.  The Weibull Distribution: A Handbook , 2008 .

[10]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[11]  Jian Pei,et al.  Correlation hiding by independence masking , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[12]  Herna L. Viktor,et al.  Multirelational classification: a multiple view approach , 2008, Knowledge and Information Systems.

[13]  Lévy-stability-under-addition and fractal structure of markets: Implications for the investment management industry and emphasized examination of MATIF notional contract , 1999 .

[14]  Katarzyna Musial,et al.  Next challenges for adaptive learning systems , 2012, SKDD.

[15]  Herna L. Viktor,et al.  Aggregation and privacy in multi-relational databases , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[16]  A. Paulson,et al.  The estimation of the parameters of the stable laws , 1975 .

[17]  Donato Malerba,et al.  A relational perspective on spatial data mining , 2008, Int. J. Data Min. Model. Manag..