An Efficient Constraint-Based Closed Set Mining Algorithm

We present a search algorithm for mining closed sets in high dimensional binary datasets. Our algorithm is designed for dense datasets, where the percentage of 1's in the dataset is usually higher than 10%, and the total number of closed sets is much larger than the number of objects in the dataset. Our algorithm is memory efficient since, unlike many other closed set mining algorithms, it does not require all patterns mined so far to be kept in the memory. Optimization techniques are introduced in this paper, and we also present a parallel version of our algorithm.

[1]  G. Kitagawa The two-filter formula for smoothing and an implementation of the Gaussian-sum smoother , 1994 .

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  Michael J. Black,et al.  Robust dynamic motion estimation over time , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Patrick Bouthemy,et al.  Multimodal Estimation of Discontinuous Optical Flow using Markov Random Fields , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[7]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[8]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[9]  Edward H. Adelson,et al.  Probability distributions of optical flow , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Raj Bhatnagar,et al.  Efficiently Mining Maximal 1-complete Regions from Dense Datasets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Michael J. Black,et al.  On the Spatial Statistics of Optical Flow , 2005, ICCV.

[12]  Alan L. Yuille,et al.  Probabilistic Motion Estimation Based on Temporal Coherence , 2000, Neural Computation.

[13]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[14]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[15]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[16]  Amitabha Das,et al.  Estimation of Occlusion and Dense Motion Fields in a Bidirectional Bayesian Framework , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Peter L. Hammer,et al.  Consensus algorithms for the generation of all maximal bicliques , 2004, Discret. Appl. Math..

[18]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[19]  Eero P. Simoncelli,et al.  Noise characteristics and prior expectations in human visual speed perception , 2006, Nature Neuroscience.

[20]  Ajit Singh,et al.  Incremental estimation of image-flow using a Kalman filter , 1991, Proceedings of the IEEE Workshop on Visual Motion.