Two-Phase Data Warehouse Optimized for Data Mining

We propose a new, heterogeneous data warehouse architecture where a first phase traditional relational OLAP warehouse coexist with a second phase data in compressed form optimized for data mining. Aggregations and metadata for the entire time frame are stored in the first phase relational database. The main advantage of the second phase is its reduced I/O requirement that enables very high throughput processing by sequential read-only data stream algorithms. It becomes feasible to run speed optimized queries and data mining operations on the entire time frame of most granular data. The second phase also enables long term data storage and analysis using a very efficient compressed format at low storage costs even for historical data. The proposed architecture fits existing data warehouse solutions. We show the effectiveness of the two-phase data warehouse through a case study of a large web portal.

[1]  Theodore Johnson,et al.  Decision support queries on a tape-resident data warehouse , 2005, Inf. Syst..

[2]  Carlo Zaniolo,et al.  Database System Extensions for Decision Support: the AXL Approach , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[3]  Katharina Morik,et al.  The MiningMart Approach to Knowledge Discovery in Databases , 2004 .

[4]  Ning Zhong,et al.  Intelligent Technologies for Information Analysis , 2004, Springer Berlin Heidelberg.

[5]  András Lukács,et al.  Shaping SQL-Based Frequent Pattern Mining Algorithms , 2005, KDID.

[6]  Sanjeev Khanna,et al.  Edinburgh Research Explorer On the Propagation of Deletions and Annotations through Views , 2013 .

[7]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[8]  Meikel Pöss,et al.  Data Compression in Oracle , 2003, VLDB.

[9]  Stefano Paraboschi,et al.  Designing data marts for data warehouses , 2001, TSEM.

[10]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[11]  M. Castells Rise of the Network Society: The Information Age: Economy, Society and Culture , 1996 .

[12]  Balázs Rácz,et al.  High density compression of log files , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[13]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[14]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[16]  Mark Sweiger,et al.  Clickstream Data Warehousing , 2002 .

[17]  Tomasz Imielinski,et al.  MSQL: A Query Language for Database Mining , 1999, Data Mining and Knowledge Discovery.

[18]  Surajit Chaudhuri,et al.  Integrating data mining with SQL databases: OLE DB for data mining , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[20]  A. Barabasi,et al.  Dynamics of information access on the web. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[22]  I. Szakadat,et al.  Fifteen Minutes of Fame: The Dynamics of Information Access on the Web , 2005 .

[23]  Wei Wang,et al.  DMQL: A Data Mining Query Language for Relational Databases , 2007 .

[24]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[25]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[26]  Jesus Mena Data Mining Your Website , 1999 .

[27]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[28]  Anupam Joshi,et al.  On Using a Warehouse to Analyze Web Logs , 2003, Distributed and Parallel Databases.

[29]  R. Kohli,et al.  Internet Recommendation Systems , 2000 .

[30]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[31]  Carlo Zaniolo,et al.  ATLaS: A Native Extension of SQL for Data Mining , 2003, SDM.