论文信息 - Efficient Sampling: Application to Image Data

Efficient Sampling: Application to Image Data

Sampling is an important preprocessing algorithm that is used to mine large data efficiently. Although a simple random sample often works fine for reasonable sample size, accuracy falls sharply with reduced sample size. In kdd'03 we proposed ease that outputs a sample based on its ‘closeness' to the original sample. Reported results show that ease outperforms simple random sampling (srs). In this paper we propose easier that extends ease in two ways. 1) ease is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. easier, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. 2) ease was shown to work on ibm quest dataset which is a categorical count dataset. easier, in addition, is shown to work on continuous data such as Color Structure Descriptor of images. Two mining tasks, classification and association rule mining, are used to validate the efficacy of easier samples vis-a-visease and srs samples.

[1] Bin Chen,et al. A new two-phase sampling based algorithm for discovering association rules , 2002, KDD.

[2] Empirical evaluation of MPEG-7 XM color descriptors in content-based retrieval of semantic image categories , 2002, Object recognition supported by user interaction for service robots.

[3] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4] Bin Chen,et al. Efficient data reduction with EASE , 2003, KDD '03.

[5] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.

[6] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7] Patrick Haffner,et al. Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[8] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[9] Rong Yan,et al. Image Classification Using a Bigram Model , 2003 .