Distributed XML processing: Theory and applications

Basic message processing tasks, such as well-formedness checking and grammar validation, common in Web service messaging, can be off-loaded from the service providers' own infrastructures. The traditional ways to alleviate the overhead caused by these tasks is to use firewalls and gateways. However, these single processing point solutions do not scale well. To enable effective off-loading of common processing tasks, we introduce the Prefix Automata SyStem - PASS, a middleware architecture which distributively processes XML payloads of web service SOAP messages during their routing towards Web servers. PASS is based on a network of automata, where PASS-nodes independently but cooperatively process parts of the SOAP message XML payload. PASS allows autonomous and pipelined in-network processing of XML documents, where parts of a large message payload are processed by various PASS-nodes in tandem or simultaneously. The non-blocking, non-wasteful, and autonomous operation of PASS middleware is achieved by relying on the prefix nature of basic XML processing tasks, such as well-formedness checking and DTD validation. These properties ensure minimal distributed processing management overhead. We present necessary and sufficient conditions for outsourcing XML document processing tasks to PASS, as well as provide guidelines for rendering suitable applications to be PASS processable. We demonstrate the advantages of migrating XML document processing, such as well-formedness checking, DTD parsing, and filtering to the network via event driven simulations.

[1]  Thomas L. Casavant,et al.  A Communicating Finite Automata Approach to Modeling Distributed Computation and Its Application to Distributed Decision-Making , 1990, IEEE Trans. Computers.

[2]  Joonho Kwon,et al.  FiST: Scalable XML Document Filtering by Sequencing Twig Patterns , 2005, VLDB.

[3]  Divyakant Agrawal,et al.  Enabling dynamic content caching for database-driven web sites , 2001, SIGMOD '01.

[4]  Marcus Fontoura,et al.  Querying XML streams , 2005, The VLDB Journal.

[5]  Jong Wook Kim,et al.  FMware: Middleware for Efficient Filtering and Matching of XML Messages with Local Data , 2006, Middleware.

[6]  Douglas B. Terry,et al.  Caching XML Web Services for Mobility , 2003, ACM Queue.

[7]  Robert M. Hierons Checking states and transitions of a set of communicating finite state machines , 2001, Microprocess. Microsystems.

[8]  Yanlei Diao,et al.  High-Performance XML Filtering: An Overview of YFilter , 2003, IEEE Data Eng. Bull..

[9]  Victor Vianu,et al.  Validating streaming XML documents , 2002, PODS.

[10]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[11]  Wei Lu,et al.  Parallel XML processing by work stealing , 2007, SOCP '07.

[12]  Georg Gottlob,et al.  The complexity of XPath query evaluation and XML typing , 2005, JACM.

[13]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[14]  Sudarshan S. Chawathe,et al.  Cooperative Data Dissemination in a Serverless Environment , 2004 .

[15]  José L. Martínez Lastra,et al.  Embedded XML DOM Parser: An Approach for XML Data Processing on Networked Embedded Systems with Real-Time Requirements , 2008, EURASIP J. Embed. Syst..

[16]  J. Barnard,et al.  Communicating X-machines , 1996, Inf. Softw. Technol..

[17]  Suresha,et al.  Proxy-based acceleration of dynamically generated content on the world wide web: an approach and implementation , 2002, SIGMOD '02.

[18]  Arun Iyengar,et al.  A scalable system for consistently caching dynamic Web data , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[19]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[20]  Daniela Rosu,et al.  On validation of XML streams using finite state machines , 2004, WebDB '04.

[21]  V. Hari Prasad,et al.  Evaluating the Network Processor Architecture for Application-Awareness , 2007, 2007 2nd International Conference on Communication Systems Software and Middleware.

[22]  D.M. Tilbury,et al.  Modular verification of modular finite state machines , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[23]  Jussi Myllymaki,et al.  Implementing a scalable XML publish/subscribe system using relational database systems , 2004, SIGMOD '04.

[24]  D. Agrawal,et al.  View Invalidation for Dynamic Content Caching in Multitiered Architectures , 2002, Very Large Data Bases Conference.

[25]  Ioana Manolescu,et al.  Dynamic XML documents with distribution and replication , 2003, SIGMOD '03.

[26]  Olga Papaemmanouil,et al.  SemCast: semantic multicast for content-based data dissemination , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  David B. Skillicorn,et al.  Models and languages for parallel computation , 1998, CSUR.

[28]  Yannis Papakonstantinou,et al.  Incremental validation of XML documents , 2003, TODS.

[29]  Valérie Issarny,et al.  Caching Strategies for Data-Intensive Web Sites , 2000, VLDB.

[30]  Jun'ichi Tatemura,et al.  AFilter: adaptable XML filtering with prefix-caching suffix-clustering , 2006, VLDB.