PROMISE: modeling and predicting user behavior for online analytical processing applications

The workload that is generated by users of explorative navigational information systems typically contains characteristic patterns. If these task- respectively user-specific patterns are known to the system, they can be used to dynamically predict user interactions at runtime. These predictions enable the usage of predictive caching strategies and speculative execution strategies. Online analytical processing (OLAP) systems are database systems that are designed to interactively explore data that is structured according to the multidimensional data model. This thesis presents an approach (called PROMISE) to represent and acquire information about navigational patterns of OLAP systems and to use them in improving dynamic materialization (caching) strategies. To this end, the thesis contains a formal model to represent behavior of OLAP users taking into account the navigational and iterative query formulation via a graphical front-end tool. An according pattern model (based on the combination of different generalized Markov models) allows for representing navigational patterns. The PROMISE prediction algorithm uses these patterns in order to dynamically predict a set of queries at any point during a session. The prediction framework is completed by an algorithm that induces and updates the pattern information for the user's actual behavior online during the system's operation. In order to demonstrate the potentials of using prediction results for dynamic system optimization, we present two approaches improving the performance of semantic query level caches in OLAP system: through improved benefit estimation functions that allow more efficient admission and eviction algorithms and by means of speculative execution techniques. An empirical analysis of the characteristics of user behavior in a real-world data warehouse environment and performance measurements using simulated traces with data from a real world application demonstrate the usefulness of the approach.

[1]  Markus Blaschka,et al.  FIESTA: A Framework for Schema Evolution in Multidimensional Databases (Abstract) , 2000, Datenbank Rundbr..

[2]  Jeffrey F. Naughton,et al.  Array-based evaluation of multi-dimensional queries in object-relational database systems , 1998, Proceedings 14th International Conference on Data Engineering.

[3]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[4]  Peter Scheuermann,et al.  Dynamic caching of query results for decision support systems , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[5]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[6]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[7]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[8]  Peter Baumann,et al.  A Database Array Algebra for Spatio-Temporal Data and Beyond , 1999, NGITS.

[9]  Timos K. Sellis,et al.  Data Warehouse Configuration , 1997, VLDB.

[10]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[11]  Carsten Sapia,et al.  PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems , 2000, DaWaK.

[12]  Luca Cabibbo,et al.  A Logical Approach to Multidimensional Databases , 1998, EDBT.

[13]  Lionel M. Ni,et al.  Solving implication problems in database applications , 1989, SIGMOD '89.

[14]  Yannis Vassiliou,et al.  On Modeling and Predicting User Behavior in OLAP Systems , 1999 .

[15]  Zhang Hengxi,et al.  A Methodological Framework for Data Warehouse Design , 2003 .

[16]  Barbara Dinter,et al.  Extending the E/R Model for the Multidimensional Paradigm , 1998, ER Workshops.

[17]  Luca Cabibbo,et al.  Querying Multidimensional Databases , 1997, DBPL.

[18]  Wei Lin,et al.  Web prefetching between low-bandwidth clients and proxies: potential and performance , 1999, SIGMETRICS '99.

[19]  Laks V. S. Lakshmanan,et al.  What can Hierarchies do for Data Warehouses? , 1999, VLDB.

[20]  Timos K. Sellis,et al.  A survey of logical models for OLAP databases , 1999, SGMD.

[21]  Ingrid Zukerman,et al.  Pre-sending Documents on the WWW: A Comparative Study , 1999, IJCAI.

[22]  Torben Bach Pedersen,et al.  Multidimensional data modeling for complex data , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[23]  Peter Scheuermann,et al.  WATCHMAN : A Data Warehouse Intelligent Cache Manager , 1996, VLDB.

[24]  Randy Appleton,et al.  The Design, Implementation, and Evaluation of a Predictive Caching File System , 1996 .

[25]  Wolfgang Lehner,et al.  Divide and aggregate: caching multidimensional objects , 2000, DMDW.

[26]  Jens Albrecht Anfrageoptimierung in Data-Warehouse-Systemen auf Grundlage des multidimensionalen Datenmodells , 2001 .

[27]  Wolfgang Lehner,et al.  CROSS-DB: a feature-extended multidimensional data model for statistical and scientific databases , 1996, CIKM '96.

[28]  Paul J. Schweitzer,et al.  Stochastic Models, an Algorithmic Approach , by Henk C. Tijms (Chichester: Wiley, 1994), 375 pages, paperback. , 1996, Probability in the Engineering and Informational Sciences.

[29]  Jef Wijsen,et al.  Nested data cubes for OLAP , 1999 .

[30]  Ivar Jacobson,et al.  Object-oriented software engineering - a use case driven approach , 1993, TOOLS.

[31]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[32]  Ingrid Zukerman,et al.  A Decision-Theoretic Approach for Pre-sending Information on the WWW , 1998, PRICAI.

[33]  Randolph Nelson,et al.  Probability, Stochastic Processes, and Queueing Theory , 1995 .

[34]  Laks V. S. Lakshmanan,et al.  Snakes and sandwiches: optimal clustering strategies for a data warehouse , 1999, SIGMOD '99.

[35]  Peter Baumann,et al.  Storage of multidimensional arrays based on arbitrary tiling , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[36]  Laks V. S. Lakshmanan,et al.  A Foundation for Multi-dimensional Databases , 1997, VLDB.

[37]  Luca Cabibbo,et al.  From a procedural to a visual query language for OLAP , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[38]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[39]  Wolfgang Lehner,et al.  The Cube-Query-Languages (CQL) for Multidimensional Statistical and Scientific Database Systems , 1997, DASFAA.

[40]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[41]  Gerhard Weikum,et al.  Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions , 1998, The VLDB Journal.

[42]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[43]  Jan W. Buzydlowski,et al.  A framework for object-oriented on-line analytic processing , 1998, DOLAP '98.

[44]  Jeffrey F. Naughton,et al.  Aggregate Aware Caching for Multi-Dimensional Queries , 2000, EDBT.

[45]  Paula Furtado Storage management of multidimensional arrays in database management systems , 2000 .

[46]  Volker Markl,et al.  Mistral - Processing Relational Queries using a Multidimensional Access Technique , 1999, Datenbank Rundbr..

[47]  Stanley B. Zdonik,et al.  Fido: A Cache That Learns to Fetch , 1991, VLDB.

[48]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[49]  Azer Bestavros,et al.  Speculative data dissemination and service to reduce server load, network traffic and service time in distributed information systems , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[50]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[51]  Chang Li,et al.  A data model for supporting on-line analytical processing , 1996, CIKM '96.

[52]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[53]  Panos Vassiliadis,et al.  Modeling multidimensional databases, cubes and cube operations , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[54]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[55]  Panos Vassiliadis,et al.  Gulliver in the land of data warehousing: practical experiences and observations of a researcher , 2000, DMDW.

[56]  Pedro Furtado,et al.  Vmhist: Efficient Multidimensional Histograms with Improved Accuracy , 2000, DaWaK.

[57]  Alejandro P. Buchmann,et al.  Encoded bitmap indexing for data warehouses , 1998, Proceedings 14th International Conference on Data Engineering.

[58]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[59]  Darrell D. E. Long,et al.  Predicting Future File-System Actions From Prior Events , 1996, USENIX Annual Technical Conference.

[60]  Wolfgang Lehner,et al.  Management of multidimensional aggregates for efficient online analytical processing , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[61]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[62]  Wolfgang Lehner,et al.  Modelling Large Scale OLAP Scenarios , 1998, EDBT.

[63]  Barbara Dinter,et al.  The OLAP market: state of the art and research issues , 1998, DOLAP '98.

[64]  Wolfgang Lehner,et al.  Set-Derivability of Multidimensional Aggregates , 1999, DaWaK.

[65]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[66]  Carsten Sapia,et al.  On Schema Evolution in Multidimensional Databases , 1999, DaWaK.

[67]  Günther Specht,et al.  HMT: Modeling Temporal Aspects in Hypermedia Applications , 2000, Web-Age Information Management.

[68]  Ingrid Zukerman,et al.  Predicting users' requests on the WWW , 1999 .

[69]  Gultekin Özsoyoglu,et al.  Extending relational algebra and relational calculus with set-valued attributes and aggregate functions , 1987, TODS.

[70]  Timos K. Sellis,et al.  Data Warehouse Schema and Instance Design , 1998, ER.

[71]  Volker Markl,et al.  Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[72]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[73]  Christophe Rigotti,et al.  A Rule-Based Data Manipulation Language for OLAP Systems , 1997, DOOD.

[74]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[75]  Anna R. Karlin,et al.  A study of integrated prefetching and caching strategies , 1995, SIGMETRICS '95/PERFORMANCE '95.