Top-Down Mining of Frequent Patterns from Very High Dimensional Data

Many real world applications deal with transactional data, characterized by a huge number of transactions (tuples) with a small number of dimensions (attributes). However, there are some other applications that involve rather high dimensional data with a small number of tuples. Examples of such applications include bioinformatics, survey-based statistical analysis, text processing, and so on. High dimensional data pose great challenges to most existing data mining algorithms. Although there are numerous algorithms dealing with transactional data sets, there are few algorithms oriented to very high dimensional data sets with a relatively small number of tuples. Taking frequent pattern mining [1, 2, 3, 4] as an example, most of the existing algorithms are column (i.e., item) enumeration-based algorithms, which take the combinations of columns (items) as search space. Due to the exponential number of column combinations, this method is not suitable for very high dimensional data.