Mining Adaptive Ratio Rules from Distributed Data Sources

Different from traditional association-rule mining, a new paradigm called Ratio Rule (RR) was proposed recently. Ratio rules are aimed at capturing the quantitative association knowledge, We extend this framework to mining ratio rules from distributed and dynamic data sources. This is a novel and challenging problem. The traditional techniques used for ratio rule mining is an eigen-system analysis which can often fall victim to noise. This has limited the application of ratio rule mining greatly. The distributed data sources impose additional constraints for the mining procedure to be robust in the presence of noise, because it is difficult to clean all the data sources in real time in real-world tasks. In addition, the traditional batch methods for ratio rule mining cannot cope with dynamic data. In this paper, we propose an integrated method to mining ratio rules from distributed and changing data sources, by first mining the ratio rules from each data source separately through a novel robust and adaptive one-pass algorithm (which is called Robust and Adaptive Ratio Rule (RARR)), and then integrating the rules of each data source in a simple probabilistic model. In this way, we can acquire the global rules from all the local information sources adaptively. We show that the RARR technique can converge to a fixed point and is robust as well. Moreover, the integration of rules is efficient and effective. Both theoretical analysis and experiments illustrate that the performance of RARR and the proposed information integration procedure is satisfactory for the purpose of discovering latent associations in distributed dynamic data source.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Lennart Ljung,et al.  Analysis of recursive stochastic algorithms , 1977 .

[3]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[4]  V. Fabian Stochastic Approximation Methods for Constrained and Unconstrained Systems (Harold L. Kushner and Dean S. Clark) , 1980 .

[5]  W. Greub Linear Algebra , 1981 .

[6]  Gene H. Golub,et al.  Matrix computations , 1983 .

[7]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Lei Xu,et al.  Least mean square error reconstruction principle for self-organizing neural-nets , 1993, Neural Networks.

[10]  Alan L. Yuille,et al.  Robust principal component analysis by self-organizing rules based on statistical physics approach , 1995, IEEE Trans. Neural Networks.

[11]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[12]  Veda C. Storey,et al.  A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..

[13]  Kadir Liano,et al.  Robust error measure for supervised neural network learning with outliers , 1996, IEEE Trans. Neural Networks.

[14]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[15]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[16]  Christos Faloutsos,et al.  Quantifiable data mining using principal component analysis , 1997 .

[17]  Christos Faloutsos,et al.  Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining , 1998, VLDB.

[18]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[19]  Christos Faloutsos,et al.  Quantifiable data mining using ratio rules , 2000, The VLDB Journal.

[20]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[21]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[22]  Juyang Weng,et al.  Candid Covariance-Free Incremental Principal Component Analysis , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[24]  Hua Li,et al.  IMMC: incremental maximum margin criterion , 2004, KDD.

[25]  Yongmin Li,et al.  On incremental and robust subspace learning , 2004, Pattern Recognit..

[26]  Xindong Wu,et al.  Database classification for multi-database mining , 2005, Inf. Syst..

[27]  H. Robbins A Stochastic Approximation Method , 1951 .

[28]  C. Buckley,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR Forum.