A novel Boolean algebraic framework for association and pattern mining

Data mining has been defined as the non- trivial extraction of implicit, previously unknown and potentially useful information from data. Association mining and sequential mining analysis are considered as crucial components of strategic control over a broad variety of disciplines in business, science and engineering. Association mining is one of the important sub-fields in data mining, where rules that imply certain association relationships among a set of items in a transaction database are discovered. In Sequence mining, data are represented as sequences of events, where order of those events is important. Finding patterns in sequences is valuable for predicting future events. In many applications such as the WEB applications, stock market, and genetic analysis, finding patterns in a sequence of elements or events, helps in predicting what could be the next event or element. At the conceptual level, association mining and sequence mining are two similar processes but using different representations of data. In association mining, items are distinct and the order of items in a transaction is not important. While in sequential pattern mining, the order of elements (events) in transactions (sequences) is important, and the same event may occur more than once. In this paper, we propose a new mapping function that maps event sequences into itemsets. Based on the unified representation of the association mining and the sequential pattern, a new approach that uses the Boolean representation of input database D to build a Boolean matrix M. Boolean algebra operations are applied on M to generate all frequent itemsets. Finally, frequent items or frequent sequential patterns are represented by logical expressions that could be minimized by using a suitable logical function minimization technique.

[1]  Ioannis Kouris,et al.  A spatiotemporal view of transactional data for data mining , 2005 .

[2]  István Vajk,et al.  Efficient sequential pattern mining algorithms , 2005 .

[3]  M. Karnaugh The map method for synthesis of combinational logic circuits , 1953, Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics.

[4]  John P. Hayes,et al.  Introduction to Digital Logic Design , 1993 .

[5]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[6]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[9]  Ming-Tat Ko,et al.  Mining DAG Patterns from DAG Databases , 2004, WAIM.

[10]  Feng Gao,et al.  Towards Generic Pattern Mining , 2005, ICFCA.

[11]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[12]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[13]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[14]  Philip S. Yu,et al.  Mining Large Itemsets for Association Rules , 1998, IEEE Data Eng. Bull..

[15]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[16]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[17]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[18]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[19]  Brian Holdsworth,et al.  Digital Logic Design , 1981 .

[20]  Spiridon D. Likothanassis,et al.  Mutual Information Clustering for Efficient Mining of Fuzzy Association Rules with Application to Gene Expression Data Analysis , 2005, Int. J. Artif. Intell. Tools.

[21]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[22]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[23]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Willard Van Orman Quine,et al.  A Way to Simplify Truth Functions , 1955 .

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  Philip S. Yu,et al.  Efficiently mining frequent closed partial orders , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[27]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[28]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[29]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[30]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[31]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[32]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[33]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .