Efficient Execution of Multiple Query Workloads in Data Analysis Applications

Applications that analyze, mine, and visualize large datasets are considered an important class of applications in many areas of science, engineering, and business. Queries commonly executed in data analysis applications often involve user-defined processing of data and application-specific data structures. If data analysis is employed in a collaborative environment, the data server should execute multiple such queries simultaneously to minimize the response time to clients. In this paper we present the design of a runtime system for executing multiple query workloads on a shared-memory machine. We describe experimental results using an application for browsing digitized microscopy images.

[1]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[2]  Joel H. Saltz,et al.  Tuning the performance of I/O-intensive parallel applications , 1996, IOPADS '96.

[3]  Jarek Gryz,et al.  Answering Queries by Semantic Caches , 1999, DEXA.

[4]  T. Kurc,et al.  Querying Very Large Multi-dimensional Datasets in ADR , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[5]  Simon H. Lavington,et al.  Knowledge Discovery from Client-Server Databases , 1998, PKDD.

[6]  Jack Minker,et al.  Multiple Query Processing in Deductive Databases using Query Graphs , 1986, VLDB.

[7]  Joel H. Saltz,et al.  Optimizing execution of component-based applications using group instances , 2002, Future Gener. Comput. Syst..

[8]  Joel H. Saltz,et al.  Performance impact of proxies in data intensive client-server applications , 1999, ICS '99.

[9]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[10]  William E Voss Caching Derived Data in Object-Oriented Databases, and An Intelligent System Design for Selecting Their Materialization Strategies , 1998 .

[11]  Joel H. Saltz,et al.  Infrastructure for Building Parallel Database Systems for Multi-Dimensional Data , 1999, IPPS/SPDP.

[12]  David J. DeWitt,et al.  Batch scheduling in parallel database systems , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[13]  Joel H. Saltz,et al.  Optimizing retrieval and processing of multi-dimensional scientific datasets , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[14]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[15]  Jeffrey F. Naughton,et al.  Query execution techniques for caching expensive methods , 1996, SIGMOD '96.

[16]  Arnon Rosenthal,et al.  Anatomy of a Mudular Multiple Query Optimizer , 1988, VLDB.

[17]  Joel H. Saltz,et al.  Design of a framework for data-intensive wide-area applications , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[18]  Joel H. Saltz,et al.  Digital dynamic telepathology-the Virtual Microscope , 1998, AMIA.

[19]  Anant Jhingran A Performance Study of Query Optimization Algorithms on a Database System Supporting Procedures , 1988, VLDB.

[20]  Bharat K. Bhargava,et al.  Multiple-Query Optimization at Algorithm-Level , 1994, Data Knowl. Eng..

[21]  Joel H. Saltz,et al.  A Hypergraph-Based Workload Partitioning Strategy for Parallel Data Aggregation , 2001, PPSC.

[22]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[23]  Yousef A. Khalidi,et al.  A Framework for Caching in an Object-Oriented System , 1993 .

[24]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[25]  Hongjun Lu,et al.  Workload Scheduling for Multiple Query Processing , 1995, Inf. Process. Lett..

[26]  Timos K. Sellis,et al.  Improvements on a Heuristic Algorithm for Multiple-Query Optimization , 1994, Data Knowl. Eng..