论文信息 - Top-Down Mining of Frequent Patterns from Very High Dimensional Data

Top-Down Mining of Frequent Patterns from Very High Dimensional Data

Many real world applications deal with transactional data, characterized by a huge number of transactions (tuples) with a small number of dimensions (attributes). However, there are some other applications that involve rather high dimensional data with a small number of tuples. Examples of such applications include bioinformatics, survey-based statistical analysis, text processing, and so on. High dimensional data pose great challenges to most existing data mining algorithms. Although there are numerous algorithms dealing with transactional data sets, there are few algorithms oriented to very high dimensional data sets with a relatively small number of tuples. Taking frequent pattern mining [1, 2, 3, 4] as an example, most of the existing algorithms are column (i.e., item) enumeration-based algorithms, which take the combinations of columns (items) as search space. Due to the exponential number of column combinations, this method is not suitable for very high dimensional data.

[1] Gösta Grahne,et al. Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[2] Anthony K. H. Tung,et al. Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[3] Jian Pei,et al. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[4] Mohammed J. Zaki,et al. CHARM: An Efficient Algorithm for Closed Association Rule Mining , 2007 .

[5] R. Agarwal. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.