Adaptive Parallelization of Queries over Dependent Web Service Calls

We have developed a system to process database queries over composed data providing web services. The queries are transformed into execution plans containing an operator that invokes any web service for given arguments. A common pattern in these query execution plans is that the output of one web service call is the input for another, etc. The challenge addressed in this paper is to develop methods to speed up such dependent calls in queries by parallelization. Since web service calls incur high-latency and message set-up costs, a naïve approach making the calls sequentially is time consuming and parallel invocations of the web service calls should improve the speed. Our approach automatically parallelizes the web service calls by starting separate query processes, each managing a parameterized sub-query, a plan function, for different parameter tuples. For a given query, the query processes are automatically arranged in a multi-level process tree where plan functions are called in parallel. The parallel plan is defined in terms of an algebra operator, FF_APPLYP, to ship in parallel to other query processes the same plan function for different parameters. By using FF_APPLYP we first investigated ways to set up different process trees manually. We concluded from our experiments that the best performing query execution plan is an almost balanced bushy tree. To automatically achieve the optimal process tree we modified FF_APPLYP to an operator AFF_APPLYP that adapts a parallel plan locally in each query process until an optimized performance is achieved. AFF_APPLYP starts with a binary process tree. During execution each query process in the tree makes local decisions to expand or shrink its process sub-tree by comparing the average time to process each incoming tuple. The query execution time obtained with AFF_APPLYP is shown to be close to the best time achieved by manually built query process trees.

[1]  Ioana Manolescu,et al.  Lazy query evaluation for Active XML , 2004, SIGMOD '04.

[2]  Marios D. Dikaiakos,et al.  Robust Runtime Optimization of Data Transfer in Queries over Web Services , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Tore Risch,et al.  Main Memory Oriented Optimization of OO Queries Using Typed Datalog with Foreign Predicates , 1992, IEEE Trans. Knowl. Data Eng..

[4]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[5]  Ioana Manolescu,et al.  Query optimization in the presence of limited access patterns , 1999, SIGMOD '99.

[6]  Tore Risch,et al.  Query processing over object views of relational data , 1997, The VLDB Journal.

[7]  Jennifer Widom,et al.  Query optimization over web services , 2006, VLDB.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Roy Goldman,et al.  WSQ/DSQ: a practical approach for combined querying of databases and the Web , 2000, SIGMOD '00.

[10]  Tore Risch,et al.  Functional Data Integration in a Distributed Mediator System , 2004 .

[11]  Felix Naumann,et al.  Super-fast XML wrapper generation in DB2: a demonstration , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[13]  Tore Risch,et al.  Web Service Mediation Through Multi-level Views , 2007 .

[14]  Patrick Valduriez,et al.  Open issues in parallel query optimization , 1996, SGMD.

[15]  Roy Goldman,et al.  WSQ/DSQ: a practical approach for combined querying of databases and the Web , 2000, SIGMOD 2000.

[16]  Tony Andrews Business Process Execution Language for Web Services Version 1.1 , 2003 .