XCraft : A Dynamic Optimizer for the Materialization of Active XML Documents

An active XML (AXML) document contains special tags that represent calls to Web services. Retrieving its contents consists inmaterializingits data elements by invoking all its embedded service calls in a P2P network. In this process, the results of some service calls are often used as inputs to other calls. Also, usually several peers provide each requested Web service, and peers can collaborate to invoke these services. This implies many equivalent materialization alternatives, with different performance. Optimizing the AXML materialization process is a hard problem, which often involves searching a huge space of solutions. Current techniques for workflow scheduling and distributed query processing are insufficient for this problem, since in AXML materialization: ( i) the set of participating peers is not known in advance; ( ii) service calls in the result of other calls forbid a simple “optimize-thenexecute” strategy; and ( iii) due to the peer volatility in the network, a plan computed by the optimizer may become invalid at the moment of its execution. Moreover, most of the current optimizers are based on centralized coordination. We propose adynamic, cost-based optimization strategy to efficiently materialize AXML documents considering the volatility of a P2P scenario. We formalize the problem from a performance-oriented perspective, and present an optimization strategy that incrementally generates and executes materialization plans. This enables the optimizer to reduce the size of the search space, get more up-to-date information on the status of the peers, and deliver partial results earlier. Our strategy can handle arbitrarily complex AXML documents, and exploits decentralization in many levels. We also present a service-oriented optimization architecture calledXCraft. We evaluated our approach in an XCraft prototype for the ActiveXML system, an open-source P2P platform. Our results show promising performance gains compared to centralized, static materialization strategies.

[1]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[2]  Radu Prodan,et al.  Scheduling of scientific workflows in the ASKALON grid environment , 2005, SGMD.

[3]  Jerry R. Hobbs,et al.  DAML-S: Web Service Description for the Semantic Web , 2002, SEMWEB.

[4]  Ioana Manolescu,et al.  Towards Cost-based Optimization for Data-intensive Web Service Computations , 2004, SBBD.

[5]  Amin Vahdat,et al.  Measuring and characterizing end-to-end Internet service performance , 2003, TOIT.

[6]  Luc Bouganim,et al.  Dynamic query scheduling in data integration systems , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  Serge Abiteboul,et al.  Schema-driven Customization of Web Services , 2003, VLDB.

[8]  Calton Pu,et al.  A Systematic Approach to Flexible Specification, Composition, and Restructuring of Workflow Activities , 2004, J. Database Manag..

[9]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[10]  ProdanRadu,et al.  Scheduling of scientific workflows in the ASKALON grid environment , 2005 .

[11]  Norman W. Paton,et al.  Resource Scheduling for Parallel Query Processing on Computational Grids , 2004, GRID.

[12]  Bruno Schulze,et al.  Hierarchical submission in a Grid environment , 2005, MGC '05.

[13]  Anne H. H. Ngu,et al.  QoS-aware middleware for Web services composition , 2004, IEEE Transactions on Software Engineering.

[14]  Serge Abiteboul,et al.  An Electronic Patient Record "on Steroids": Distributed, Peer-to-Peer, Secure and Privacy-conscious , 2004, VLDB.

[15]  Serge Abiteboul,et al.  Building an Active Content Warehouse , 2006 .

[16]  Serge Abiteboul,et al.  The Active XML project: an overview , 2008, The VLDB Journal.

[17]  Dan Suciu,et al.  Dynamically distributed query evaluation , 2001, PODS.

[18]  Athman Bouguettaya,et al.  A Dynamic Foundational Architecture for Semantic Web Services , 2005, Distributed and Parallel Databases.

[19]  Rizos Sakellariou,et al.  A hybrid heuristic for DAG scheduling on heterogeneous systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[20]  Ioana Manolescu,et al.  A Framework for Distributed XML Data Management , 2006, EDBT.

[21]  Serge Abiteboul,et al.  Exchanging intensional XML data , 2003, TODS.

[22]  Paulo F. Pires,et al.  Automatic composition of Web services with contingency plans , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[23]  Rizos Sakellariou,et al.  A low-cost rescheduling policy for efficient mapping of workflows on grid systems , 2004, Sci. Program..

[24]  Ioana Manolescu,et al.  Dynamic XML documents with distribution and replication , 2003, SIGMOD '03.

[25]  Serge Abiteboul,et al.  Positive active XML , 2004, PODS '04.

[26]  Yan Huang,et al.  Dynamic web service selection for workflow optimisation , 2005 .

[27]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[28]  Nabor das Chagas Mendonça,et al.  An empirical evaluation of client-side server selection policies for accessing replicated web services , 2005, SAC '05.

[29]  Carlos José Pereira de Lucena,et al.  A Peer-To-Peer Platform Based on Semantic Web Services , 2003, WWW.

[30]  Heiko Ludwig,et al.  Web Service Level Agreement (WSLA) Language Specification , 2003 .

[31]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[32]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[33]  Ryan S. Baker,et al.  JDSL: The data structures library in java , 2001 .

[34]  Jun Gu,et al.  Efficient Local Search for DAG Scheduling , 2001, IEEE Trans. Parallel Distributed Syst..

[35]  Ioana Manolescu,et al.  Active XML: A Data-Centric Perspective on Web Services , 2004, Web Dynamics.

[36]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[37]  Yolanda Gil,et al.  Automatically composed workflows for grid environments , 2004, IEEE Intelligent Systems.

[38]  W. B. Bradley,et al.  The NEMO P2P service orchestration framework , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[39]  William van Dorst The Quintessential Linux Benchmark: All about the BogoMips number displayed when Linux boots , 1996 .

[40]  Dan Suciu,et al.  Distributed query evaluation on semistructured data , 2002, TODS.

[41]  Ken Kennedy,et al.  TaskScheduling Strategies forWorkflow-based Applications inGrids , 2005 .

[42]  Norman W. Paton,et al.  OGSA-DQP: A Service for Distributed Querying on the Grid , 2004, EDBT.

[43]  Ioana Manolescu,et al.  Constructing and querying peer-to-peer warehouses of XML resources , 2005, 21st International Conference on Data Engineering (ICDE'05).

[44]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[45]  Alon Y. Halevy,et al.  Efficiently ordering query plans for data integration , 1999, Proceedings 18th International Conference on Data Engineering.

[46]  Inês Dutra,et al.  Application partitioning and hierarchical application management in grid environments , 2004 .

[47]  Towards autonomous service composition in a grid environment , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[48]  D. Box,et al.  Simple object access protocol (SOAP) 1.1 , 2000 .

[49]  Amit P. Sheth,et al.  The METEOR-S Approach for Configuring and Executing Dynamic Web Processes , 2005 .

[50]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[51]  Ioana Manolescu,et al.  Lazy query evaluation for Active XML , 2004, SIGMOD '04.

[52]  Jianwen Su,et al.  Optimization techniques for data-intensive decision flows , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).