Multi-relational pattern mining over data streams

The data storage paradigm has changed in the last decade, from operational databases to data repositories that make easier to analyze data and mining information. Among those, the primary multidimensional model represents data through star schemas, where each relation denotes an event involving a set of dimensions or business perspectives. Mining data modeled as a star schema presents two major challenges, namely: mining extremely large amounts of data and dealing with several data tables at the same time. In this paper, we describe an algorithm—Star FP Stream, in detail. This algorithm aims for finding the set of frequent patterns in a large star schema, mining directly the data, in their original structure, and exploring the most efficient techniques for mining data streams. Experiments were conducted over two star schemas, in the healthcare and sales domains.

[1]  Li-jun Xu,et al.  A novel algorithm for frequent itemset mining in data warehouses , 2006 .

[2]  Cláudia Antunes,et al.  Finding Multi-dimensional Patterns in Healthcare , 2014, MLDM.

[3]  Nuno A. Fonseca,et al.  Strategies to Parallelize ILP Systems , 2005, ILP.

[4]  Cláudia Antunes,et al.  Finding Patterns in Large Star Schemas at the Right Aggregation Level , 2012, MDAI.

[5]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[6]  Ke Wang,et al.  Mining association rules from stars , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Kyuseok Shim,et al.  Mining optimized association rules with categorical and numeric attributes , 1998, Proceedings 14th International Conference on Data Engineering.

[8]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[9]  Cláudia Antunes,et al.  Pattern Mining on Stars with FP-Growth , 2010, MDAI.

[10]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Bingru Yang,et al.  Mining Multi-relational Frequent Patterns in Data Streams , 2009, 2009 International Conference on Business Intelligence and Financial Engineering.

[12]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[15]  PeiJian,et al.  Mining Frequent Patterns without Candidate Generation , 2000 .

[16]  Nandit Soparkar,et al.  Frequent Itemset Counting Across Multiple Tables , 2000, PAKDD.

[17]  Cláudia Antunes,et al.  Multi-dimensional Pattern Mining - A Case Study in Healthcare , 2014, ICEIS.

[18]  Donato Malerba,et al.  A Logical Framework for Frequent Pattern Discovery in Spatial Data , 2001, FLAIRS.

[19]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[20]  Joost N. Kok,et al.  Faster Association Rules for Multiple Relations , 2001, IJCAI.

[21]  Michelangelo Ceci,et al.  A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets , 2011, Intell. Data Anal..

[22]  Luc De Raedt,et al.  Mining Association Rules in Multiple Relations , 1997, ILP.

[23]  Cláudia Antunes,et al.  Mining Patterns from Large Star Schemas Based on Streaming Algorithms , 2012 .

[24]  Einoshin Suzuki,et al.  Application of PrototypeLines to Chronic Hepatitis Data , 2003 .

[25]  Cláudia Antunes,et al.  Mining Multi-dimensional Patterns for Student Modelling , 2014, EDM.

[26]  Donato Malerba,et al.  A Sliding Window Algorithm for Relational Frequent Patterns Mining from Data Streams , 2009, Discovery Science.

[27]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[28]  Hongyan Liu,et al.  Methods for mining frequent items in data streams: an overview , 2009, Knowledge and Information Systems.