Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVI

A zero-one high-dimensional data set is said to be banded if all the dimensions can be reorganised such that the “non zero” entries are arranged along the leading diagonal across the dimensions. Our goal is to develop effective algorithms that identify banded patterns in multidimensional zero-one data by automatically rearranging the ordering of all the dimensions. Rearranging zero-one data so as to feature “bandedness” allows for the identification of hidden information and enhances the operation of many data mining algorithms (and other algorithms) that work with zero-one data. In this paper two N-Dimensional Banded Pattern Mining (NDBPM) algorithms are presented. The first is an approximate algorithm (NDBPMAPPROX) and the second an exact algorithm (NDBPMEXACT ). Two variations of NDBPMEXACT are presented (Euclidean and Manhattan). Both algorithms are fully described together with evaluations of their operation.

[1]  Mario Boley On Approximating Minimum Infrequent and Maximum Frequent Sets , 2007, Discovery Science.

[2]  Rui Abreu,et al.  A Low-Cost Approximate Minimal Hitting Set Algorithm and its Application to Model-Based Diagnosis , 2009, SARA.

[3]  Alain Bretto,et al.  A reductive approach to hypergraph clustering: An application to image segmentation , 2012, Pattern Recognit..

[4]  Jean-Marc Petit,et al.  ABS: Adaptive Borders Search of frequent itemsets , 2004, FIMI.

[5]  H. Mannila,et al.  Discovering all most specific sentences , 2003, TODS.

[6]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[7]  M. Karonski,et al.  On Marczewski-Steinhaus type distance between hypergraphs , 1977 .

[8]  Takeaki Uno,et al.  Efficient algorithms for dualizing large-scale hypergraphs , 2011, Discret. Appl. Math..

[9]  Staal A. Vinterbo,et al.  Minimal approximate hitting sets and rule templates , 2000, Int. J. Approx. Reason..

[10]  Takeaki Uno,et al.  Enumerating Maximal Frequent Sets Using Irredundant Dualization , 2003, Discovery Science.

[11]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[12]  Aristides Gionis,et al.  Approximating a collection of frequent sets , 2004, KDD.

[13]  Jean-Marc Petit,et al.  Zigzag: a new algorithm for mining large inclusion dependencies in databases , 2003, Third IEEE International Conference on Data Mining.

[14]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[15]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[16]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[17]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[18]  S. W. Song,et al.  A parallel approximation hitting set algorithm for gene expression analysis , 2002, 14th Symposium on Computer Architecture and High Performance Computing, 2002. Proceedings..

[19]  Mohammad Al Hasan,et al.  MUSK: Uniform Sampling of k Maximal Patterns , 2009, SDM.

[20]  Jinyan Li,et al.  Mining border descriptions of emerging patterns from dataset pairs , 2005, Knowledge and Information Systems.

[21]  Bruno Crémilleux,et al.  A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation , 2007, Fundam. Informaticae.

[22]  Elias C. Stavropoulos,et al.  Journal of Graph Algorithms and Applications an Efficient Algorithm for the Transversal Hypergraph Generation , 2022 .

[23]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[24]  Guizhen Yang,et al.  The complexity of mining maximal frequent itemsets and maximal frequent patterns , 2004, KDD.

[25]  James Bailey,et al.  A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns , 2003, Third IEEE International Conference on Data Mining.

[26]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[27]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[28]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[29]  Bruno Crémilleux,et al.  Nonredundant Generalized Rules and Their Impact in Classification , 2010, Advances in Intelligent Information Systems.

[30]  Jean-Marc Petit,et al.  A new classification of datasets for frequent itemsets , 2008, Journal of Intelligent Information Systems.

[31]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[32]  Yang Xiang,et al.  Cartesian contour: a concise representation for a collection of frequent sets , 2009, KDD.

[33]  Georg Gottlob,et al.  Hypergraph Transversal Computation and Related Problems in Logic and AI , 2002, JELIA.

[34]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[35]  Nicolas Durand,et al.  Approximation of Frequent Itemset Border by Computing Approximate Minimal Hypergraph Transversals , 2014, DaWaK.

[36]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[37]  Leonid Khachiyan,et al.  On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[38]  Bart Goethals,et al.  Randomly sampling maximal itemsets , 2013, IDEA@KDD.

[39]  Nicolas Durand,et al.  ECCLAT: a New Approach of Clusters Discovery in Categorical Data , 2003 .

[40]  J. Bailey,et al.  Efficient Mining of Contrast Patterns and Their Applications to Classification , 2005, 2005 3rd International Conference on Intelligent Sensing and Information Processing.