Panta Rhei: Flexible Execution Engine for Search Computing Queries

The efficient execution of data-intensive computations over services is a challenging task: data are retrieved from remote sources and therefore are not available in the query engine until after the execution of these calls, but the system must be inherently efficient thereafter, by guaranteeing that data is immediately cached and processed efficiently, according to the best query plan. In this chapter, we present a flexible execution model for search computing queries, named Panta Rhei. The proposed execution engine paradigm adopts the producer/consumer model and supports both data-driven and event-driven synchronization, and their interplay. Query plans are modeled as directed graphs, whose nodes are processing units and whose edges are either control or data flows. While control flows synchronize service calls and unit execution, data flows transfer data between units that process data flows to produce query results. We present the specification of Panta Rhei by formally defining the units for data production, consumption, manipulation, and caching, as well as the control and data flows. Finally, we discuss how a query plan is expressed in terms of a query execution plan.

[1]  Ioana Manolescu,et al.  Efficient Querying of Distributed Resources in Mediator Systems , 2002, OTM.

[2]  Asuman Dogac,et al.  Multidatabase Query Optimization , 2004, Distributed and Parallel Databases.

[3]  Boualem Benatallah Web Information Systems Engineering - WISE 2007, 8th International Conference on Web Information Systems Engineering, Nancy, France, December 3-7, 2007, Proceedings , 2007, WISE.

[4]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[5]  David Abramson,et al.  Economic models for resource management and scheduling in Grid computing , 2002, Concurr. Comput. Pract. Exp..

[6]  Eugene Wong,et al.  Decomposition—a strategy for query processing , 1976, TODS.

[7]  Richard T. Snodgrass,et al.  Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data : SIGMOD '94, Minneapolis, Minnesota, May 24-27, 1994 , 1994, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[8]  Surajit Chaudhuri,et al.  Query optimizers: time to rethink the contract? , 2009, SIGMOD Conference.

[9]  Goetz Graefe Iterators, Schedulers, and Distributed-memory Parallelism , 1996, Softw. Pract. Exp..

[10]  Surajit Chaudhuri,et al.  A pay-as-you-go framework for query execution feedback , 2008, Proc. VLDB Endow..

[11]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE , 2002, Lecture Notes in Computer Science.

[12]  Norman W. Paton,et al.  Self-monitoring query execution for adaptive query processing , 2004, Data Knowl. Eng..

[13]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[14]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[15]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[16]  Jennifer Widom,et al.  Query optimization over web services , 2006, VLDB.

[17]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[18]  Robert S. V. Pascoe,et al.  A history of data-flow languages , 1994, IEEE Annals of the History of Computing.

[19]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[20]  Hamid Pirahesh,et al.  Compiled Query Execution Engine using JVM , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[22]  Ben Shneiderman,et al.  Design and Evaluation of Incremental Data Structures and Algorithms for Dynamic Query Interfaces , 1997, INFOVIS.

[23]  Wayne W. Eckerson Performance Dashboards: Measuring, Monitoring, and Managing Your Business , 2005 .

[24]  John B. Goodenough,et al.  Exception handling: issues and a proposed notation , 1975, CACM.

[25]  Michael Grossniklaus,et al.  An Object-Oriented Version Model for Context-Aware Data Management , 2007, WISE.

[26]  C. van Reeuwijk,et al.  Maestro: a self-organizing peer-to-peer dataflow framework using reinforcement learning , 2009, HPDC '09.

[27]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[28]  Tiziana Catarci,et al.  Visual Query Systems for Databases: A Survey , 1997, J. Vis. Lang. Comput..

[29]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.