Servicing Mixed Data Intensive Query Workloads

When data analysis applications are employed in a multiclient environment, a data server must service multiple simultaneous queries, each of which may employ complex user-defined data structures and operations on the data. It is then necessary to harness interand intra-query commonalities and system resources to improve the performance of the data server. We have developed a framework and customizable middleware to enable reuse of intermediate and final results among queries, through an in-memory semantic cache and user-defined transformation functions. Since resources such as processing power and memory space are limited on the machine hosting the server, effective scheduling of incoming queries and efficient cache replacement policies are challenging issues that must be addressed. We have addressed the scheduling problem in earlier work, and in this paper we describe and evaluate several cache replacement policies. We present experimental evaluation of the policies on a shared-memory parallel system using two applications from different domains.

[1]  Joel H. Saltz,et al.  Coupling Multiple Simulations via a High Performance Customizable Database System , 1999, PPSC.

[2]  Noah Treuhaft,et al.  Cluster I/O with River: making the fast case common , 1999, IOPADS '99.

[3]  Joel H. Saltz,et al.  Performance impact of proxies in data intensive client-server applications , 1999, ICS '99.

[4]  Joel H. Saltz,et al.  Multiple Query Optimization for Data Analysis Applications on Clusters of SMPs , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[5]  Joel H. Saltz,et al.  Scheduling multiple data visualization query workloads on a shared memory machine , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[6]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[7]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[8]  Bharat K. Bhargava,et al.  Multiple-Query Optimization at Algorithm-Level , 1994, Data Knowl. Eng..

[9]  Joel H. Saltz,et al.  Digital dynamic telepathology-the Virtual Microscope , 1998, AMIA.

[10]  J. T. Robinson,et al.  Data cache management using frequency-based replacement , 1990, SIGMETRICS '90.

[11]  Bernd Reiner,et al.  Parallel Query Support for Multidimensional Data: Intra-object Parallelism , 2002, DEXA.

[12]  Krithi Ramamritham,et al.  Materialized view selection and maintenance using multi-query optimization , 2000, SIGMOD '01.

[13]  Jeffrey F. Naughton,et al.  Simultaneous optimization and evaluation of multiple dimensional queries , 1998, SIGMOD '98.

[14]  Willy Zwaenepoel,et al.  IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[15]  Peter Baumann,et al.  The multidimensional database system RasDaMan , 1998, SIGMOD '98.

[16]  Martin F. Arlitt,et al.  Improving Proxy Cache Performance: Analysis of Three Replacement Policies , 1999, IEEE Internet Comput..

[17]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[18]  S. Sudarshan,et al.  Query scheduling in multi query optimization , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[19]  Larry S. Davis,et al.  An Efficient System for Multi-perspective Imaging and Volumetric Shape Analysis , 2001 .

[20]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[21]  C. Dawson,et al.  A Godunov-type finite volume method for the system of shallow water equations , 1998 .

[22]  T. Kurc,et al.  Efficient Execution of Multiple Query Workloads in Data Analysis Applications , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[23]  HalevyAlon,et al.  MiniCon: A scalable algorithm for answering queries using views , 2001, VLDB 2001.

[24]  Kwan-Liu Ma,et al.  3D visualization of unsteady 2D airplane wake vortices , 1994, Proceedings Visualization '94.

[25]  T. Tanaka,et al.  Configurations of the solar wind flow and magnetic field around the planets with no magnetic field : calculation by a new MHD simulation scheme , 1993 .

[26]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[27]  Joel H. Saltz,et al.  Scheduling in a High Performance Remote-Sensing Data Server , 1997, PPSC.

[28]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[29]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[30]  Joel H. Saltz,et al.  T2: a customizable parallel database for multi-dimensional data , 1998, SGMD.

[31]  Joel H. Saltz,et al.  Design of a framework for data-intensive wide-area applications , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[32]  Richard Szeliski,et al.  Rapid octree construction from image sequences , 1993 .

[33]  Martin F. Arlitt,et al.  Performance evaluation of Web proxy cache replacement policies , 1998, Perform. Evaluation.

[34]  T. Cole,et al.  User's guide to the CE-QUAL-ICM three-dimensional eutrophication model : release version 1.0 , 1995 .