High-level parallelisation in a database cluster: a feasibility study using document services

Our concern is the design of a scalable infrastructure for complex application services. We want to find out if a cluster of commodity database systems is well-suited as such an infrastructure. To this end, we have carried out a feasibility study based on document services, e.g. document insertion and retrieval. We decompose a service request into short parallel database transactions. Our system, implemented as an extension of a transaction processing monitor, routes the short transactions to the appropriate database systems in the cluster. Routing depends on the data distribution that we have chosen. To avoid bottlenecks, we distribute document functionality, such as term extraction, over the cluster. Extensive experiments show the following. (1) A relatively small number of components - for example eight components $already suffices to cope with high workloads of more than 100 concurrently active clients. (2) Speedup and throughput increase linearly for insertion operations when increasing the cluster size. These observations also hold when bundling service invocations into transactions at the semantic layer. A specialized coordinator component then implements semantic serializability and atomicity. Our experiments show that such a coordinator has minimal impact on CPU resource consumption and on response times.

[1]  Torsten Grabs,et al.  A Parallel Document Engine Built on Top of a Cluster of Databases - Design, Implementation, and Experiences - , 2000, ICDE 2000.

[2]  Krithi Ramamritham,et al.  Efficient transaction support for dynamic information retrieval systems , 1996, SIGIR '96.

[3]  Torsten Grabs,et al.  A document engine on a db cluster , 1999 .

[4]  Ophir Frieder,et al.  Integrating structured data and text: a relational approach , 1997 .

[5]  Sharad Mehrotra,et al.  The Gold Text Indexing Engine , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[6]  Patrick Valduriez,et al.  Transaction chopping: algorithms and performance studies , 1995, TODS.

[7]  Samuel DeFazio Overview of the Full-Text Document Retrieval Benchmark , 1993, The Benchmark Handbook.

[8]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[9]  Chaitanya K. Baru,et al.  DB2 Parallel Edition , 1995, IBM Syst. J..

[10]  Hans-Jörg Schek,et al.  A Predicate Oriented Locking Approach for Integrated Information Systems , 1983, IFIP Congress.

[11]  Yuri Breitbart,et al.  Unifying Concurrency Control and Recovery of Transactions with Semantically Rich Operations , 1998, Theor. Comput. Sci..

[12]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[13]  Hans-Jörg Schek,et al.  Evaluating the Coordination Overhead of Replica Maintenance in a Cluster of Databases , 2000, Euro-Par.

[14]  Oscar H. Ibarra,et al.  Toward a Scalable Distributed {WWW} Server on Workstation Clusters , 1997, J. Parallel Distributed Comput..

[15]  Hans-Jörg Schek,et al.  Extending TP-monitors for intra-transaction parallelism , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[16]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[17]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[18]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[19]  Steve Kirsch Infoseek's experiences searching the internet , 1998, SIGF.

[20]  Gerhard Weikum,et al.  Federated Transaction Management with Snapshot Isolation , 1999, FMLDO.

[21]  B. R. Badrinath,et al.  Performance evaluation of semantics-based multilevel concurrency control protocols , 1990, SIGMOD '90.

[22]  Gerhard Weikum,et al.  Principles and realization strategies of multilevel transaction management , 1991, TODS.