论文信息 - Correlation Range Query

Correlation Range Query

Efficient correlation computation has been an active research area of data mining. Given a large dataset and a specified query item, we are interested in finding items in the dataset that are within certain range of correlation with the query item. Such a problem, known as the correlation range query (CRQ), has been a common task in many application domains. In this paper, we identify piecewise monotone properties of the upper and lower bounds of the φ coefficient, and propose an efficient correlation range query algorithm, called CORAQ. The CORAQ algorithm effectively prunes many items without computing their actual correlation coefficients with the query item. CORAQ also attains completeness and correctness of the query results. Experiments with large benchmark datasets show that this algorithm is much faster than its brute-force alternative and scales well with large datasets.

Wenjun Zhou | Hao Zhang | Hao Zhang | Wenjun Zhou

[1] Michel Wedel,et al. Cross-Selling Through Database Marketing: A Mixed Data Factor Analyzer for Data Augmentation and Prediction , 2003 .

[2] Hui Xiong,et al. Volatile correlation computation: a checkpoint view , 2008, KDD.

[3] Hui Xiong,et al. TAPER: a two-step approach for all-strong-pairs correlation query in large databases , 2006 .

[4] Hui Xiong,et al. Checkpoint evolution for volatile correlation computing , 2011, Machine Learning.

[5] Paul Gray,et al. A Survey of Database Marketing , 1999 .

[6] Vipin Kumar,et al. Introduction to Data Mining, (First Edition) , 2005 .

[7] Vipin Kumar,et al. Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[8] Hui Xiong,et al. Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs , 2004, KDD.

[9] Hui Xiong,et al. Top-k Correlation Computation , 2008, INFORMS J. Comput..

[10] Hui Xiong,et al. TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11] Hui Xiong,et al. Identification of Functional Modules in Protein Complexes via Hyperclique Pattern Discovery , 2004, Pacific Symposium on Biocomputing.

[12] Paul Brown,et al. CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.