Classification of Datasets with Frequent Itemsets is Wild

The problem of dataset classification with frequent itemsets is defined as the problem of determining whether or not two different datasets have the same frequent itemsets without computing these itemsets explicitly. The reasoning behind this approach is high computational cost of computing frequent itemsets. Finding welldefined and understandable normal forms for this classification task would be a breakthrough in dataset classification field. The paper proves that classification of datasets with frequent itemsets is a hopeless task since canonical forms do not exist for this problem.

[1]  Salvatore Orlando,et al.  Statistical properties of transactional databases , 2004, SAC '04.

[2]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[3]  Ju. A. Drozd,et al.  Tame and wild matrix problems , 1980 .

[4]  Christopher D. Carothers,et al.  VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining , 2010, TKDD.

[5]  Jean-Marc Petit,et al.  A new classification of datasets for frequent itemsets , 2008, Journal of Intelligent Information Systems.

[6]  Mark Giesbrecht,et al.  Nearly Optimal Algorithms for Canonical Matrix Forms , 1995, SIAM J. Comput..

[7]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8]  Shmuel Friedland Simultaneous similarity of matrices , 1983 .

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Toon Calders Computational complexity of itemset frequency satisfiability , 2004, PODS '04.

[11]  Guizhen Yang,et al.  The complexity of mining maximal frequent itemsets and maximal frequent patterns , 2004, KDD.

[12]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[13]  Osmar R. Zaïane,et al.  Classifying Text Documents by Associating Terms With Text Categories , 2002, Australasian Database Conference.

[14]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[15]  Srinivasan Parthasarathy,et al.  Clustering Distributed Homogeneous Datasets , 2000, PKDD.

[16]  Adriano Veloso,et al.  classification problem , 2022 .

[17]  Ramesh Subramonian Defining diff as a Data Mining Primitive , 1998, KDD.