Implementation and Analysis of a Parallel Collection Query Language

We study implementation techniques for a parallel query language for nested collections. The language handles collections of three kinds (sets, bags, and sequences), and its expressive power is essentially that of OQL (ODMG93). From the perspective of parallel evaluation, the novelty of such a query language is that it can express nested parallelism, which is naturally associated to nested collections. The first implementation step is a translation into a specially designed algebra for flat sequences, having only flat parallelism: the translation “flattens” the nested parallelism, and we prove that it preserves the asymptotic parallel complexity. The second step consists in an implementation of the sequence algebra on a shared nothing architecture. Here we show that all data communications needed by the sequence algebra operators (with one exception) have a particular communication pattern, called monotone communication. We give a provably optimal algorithm for monotone communications on a shared nothing architecture. Here “optimal” means that for any particular initial and final data layout, its communication cost is absolute minimum (not within a constant factor). To account for the communication costs we chose as sh’ared nothing model the recently proposed LogP model. Finally we report some This work was done while the author was at the University of Penneylvania, and was partially supported by NSF Grant CCR-90-57570, ONR Contract N00014-93-11284, and by a fellowship from the Institute for Research in Cognitive Science. Permission to copy without fee all or part of this material is granted provided that the copies anz not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 22nd VLDB Conference Mumbai(Bombay), India, 1996 preliminary experiments of our implementation techniques, on a LogP simulator.

[1]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[2]  Limsoon Wong,et al.  Naturally Embedded Query Languages , 1992, ICDT.

[3]  Patrick Valduriez,et al.  FAD, a Powerful and Simple Database Language , 1987, VLDB.

[4]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[5]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[6]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[7]  Dan Suciu,et al.  Efficient compilation of high-level data parallel algorithms , 1994, SPAA '94.

[8]  Shahram Ghandeharizadeh,et al.  Object placement in parallel object-oriented database systems , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[9]  David J. DeWitt,et al.  The oo7 Benchmark , 1993, SIGMOD Conference.

[10]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[11]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[12]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[13]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[14]  David J. DeWitt,et al.  The Object-Oriented Database System Manifesto , 1994, Building an Object-Oriented Database System, The Story of O2.

[15]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[16]  David J. DeWitt,et al.  Algebraic support for complex objects with arrays, identity, and inheritance , 1991, SIGMOD '91.

[17]  Guy E. Blelloch,et al.  Compiling Collection-Oriented Languages onto Massively Parallel Computers , 1990, J. Parallel Distributed Comput..

[18]  Catriel Beeri,et al.  On the power of languages for manipulation of complex objects , 1987, VLDB 1987.

[19]  David Maier,et al.  Towards an effective calculus for object query languages , 1995, SIGMOD '95.

[20]  Dan Suciu Parallel programming languages for collections , 1996 .

[21]  Lewis W. Tucker,et al.  CMMD: Active Messages on the CM-5 , 1994, Parallel Comput..

[22]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[23]  Patrick Valduriez,et al.  SVP - a Model Capturing Sets, Streams, and Parallelism , 1998 .

[24]  Patrick Valduriez,et al.  SVP: A Model Capturing Sets, Lists, Streams, and Parallelism , 1992, Very Large Data Bases Conference.

[25]  O. Deux,et al.  The story of O 2 , 1992 .

[26]  Leslie G. Valiant,et al.  Parallelism in Comparison Problems , 1975, SIAM J. Comput..

[27]  David J. DeWitt,et al.  The 007 Benchmark , 1993, SIGMOD '93.

[28]  O. Deux,et al.  The Story of O2 , 1990, IEEE Trans. Knowl. Data Eng..

[29]  Shahram Ghandeharizadeh,et al.  Design and Implementation of the Omega Object-Based System , 1993, Australian Database Conference.

[30]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 , 1993 .