Building the Data Warehouse of Frequent Itemsets in the DWFIST Approach

Some data mining tasks can produce such great amounts of data that we have to cope with a new knowledge management problem. Frequent itemset mining fits in this category. Different approaches were proposed to handle or avoid somehow this problem. All of them have problems and limitations. In particular, most of them need the original data during the analysis phase, which is not feasible for data streams. The DWFIST (Data Warehouse of Frequent ItemSets Tactics) approach aims at providing a powerful environment for the analysis of itemsets and derived patterns, such as association rules, without accessing the original data during the analysis phase. This approach is based on a Data Warehouse of Frequent Itemsets. It provides frequent itemsets in a flexible and efficient way as well as a standardized logical view upon which analytical tools can be developed. This paper presents how such a data warehouse can be built.

[1]  Pier Luca Lanzi,et al.  Database Support for Data Mining Applications , 2004, Lecture Notes in Computer Science.

[2]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[3]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[4]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[5]  Yelena Yesha,et al.  Data Mining: Next Generation Challenges and Future Directions , 2004 .

[6]  Heikki Mannila,et al.  Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[7]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[8]  Nectaria Tryfona,et al.  starER: a conceptual model for data warehouse design , 1999, DOLAP '99.

[9]  Sushil Jajodia,et al.  Discovering calendar-based temporal association rules , 2003 .

[10]  S. Spaccapietra,et al.  Data mining and reverse engineering : searching for semantics : IFIP TC2 WG2.6 IFIP Seventh Conference on Database Semantics (DS-7), 7-10 October 1997, Leysin, Switzerland , 1998 .

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[13]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[14]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[15]  Jiawei Han,et al.  OLAP Mining: Integration of OLAP with Data Mining , 1997, DS-7.

[16]  Liris Cnrs,et al.  Inductive Databases and Multiple Uses of Frequent Itemsets: The cInQ Approach , 2004 .