Time-partitioned Index Design for Adaptive Multi-Route Data Stream Systems utilizing Heavy Hitter Algorithms

Adaptive multi-route query processing (AMR) is a recently emerging paradigm for processing stream queries in highly fluctuating environments. AMR dynamically routes batches of tuples to operators in the query network based on routing criteria and up-to-date system statistics. In the context of AMR systems, indexing, a core technology for efficient stream processing, has received little attention. Indexing in AMR systems is demanding as indices must adapt to serve continuously evolving query paths while maintaining index content under high volumes of data. Our proposed Adaptive Multi-Route Index (AMRI) employs a bitmap time-partitioned design that while being versatile in serving a diverse ever changing workload of multiple query access patterns remains lightweight in terms of maintenance and storage requirements. In addition, our AMRI index design and migration strategies seeks to met the indexing needs of both older partially serviced and newer incoming search requests. We show that the effect on the quality of the index configuration selected based on using AMRIs compressed statistics can be bounded to a preset constant. Our experimental study using both synthetic and real data streams has demonstrated that our AMRI strategy strikes a balance between supporting effective query processing in dynamic stream environments while keeping the index maintenance and tuning costs to a minimum. Using a data set collected by environmental sensors placed in the Intel Berkeley Research lab, our AMRI outperforms the state-of-the-art approach on average by 68% in cumulative throughput.

[1]  Doron Rotem,et al.  Chunking of Large Multidimensional Arrays , 2007 .

[2]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[3]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[4]  Alexander E. Mohr Bit allocation in sub-linear time and the multiple-choice knapsack problem , 2002, Proceedings DCC 2002. Data Compression Conference.

[5]  Vivek R. Narasayya,et al.  Automatic physical design tuning: workload as a sequence , 2006, SIGMOD Conference.

[6]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Surajit Chaudhuri,et al.  An Online Approach to Physical Design Tuning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Serge Abiteboul,et al.  COLT: continuous on-line tuning , 2006, SIGMOD Conference.

[9]  David J. DeWitt,et al.  Tuple Routing Strategies for Distributed Eddies , 2003, VLDB.

[10]  Benoît Dageville,et al.  Automatic SQL Tuning in Oracle 10g , 2004, VLDB.

[11]  Arvola Chan,et al.  Index selection in a self-adaptive data base management system , 1976, SIGMOD '76.

[12]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[13]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[14]  Joseph M. Hellerstein,et al.  Using state modules for adaptive query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[16]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[17]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[18]  Kesheng Wu,et al.  Towards Optimal Multi-Dimensional Query Processing with BitmapIndices , 2005 .

[19]  Serge Abiteboul,et al.  On-Line Index Selection for Shifting Workloads , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[20]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[21]  Alfred V. Aho,et al.  Optimal partial-match retrieval when fields are independently specified , 1979, ACM Trans. Database Syst..

[22]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[23]  Elke A. Rundensteiner,et al.  Index tuning for adaptive multi-route data stream systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[24]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[25]  Theodore Johnson,et al.  Sampling algorithms in a stream operator , 2005, SIGMOD '05.

[26]  Jennifer Widom,et al.  Content-Based Routing: Different Plans for Different Data , 2005, VLDB.

[27]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[28]  Luping Ding,et al.  CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity , 2004, VLDB.

[29]  Kesheng Wu,et al.  Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[30]  Elke A. Rundensteiner,et al.  Index tuning for parameterized streaming groupby queries , 2008, SSPS '08.

[31]  Surajit Chaudhuri,et al.  Database tuning advisor for microsoft SQL server 2005: demo , 2005, SIGMOD '05.