Early Experiences in Implementing the Buffer Tree

Computer processing speeds are increasing rapidly due to the evolution of faster chips parallel processing of data and more e cient software Users today have access to an unprecedented amount of high quality high resolution data through various technologies This is resulting in a growing demand for higher performance input and output mechanisms in order to pass huge data sets from the external memory EM or disk system through the relatively small main memory of the computer and back again In recent years research into external memory algorithms has been growing to keep pace with the demand for innovation in this area EM algorithms for individual problems have been developed but few general purpose EM tools have been designed A fundamental tool is the bu er tree an external version of the a b tree It can be used to satisfy a number of EM requirements such as sorting priority queues range searching etc in a straightforward and I O optimal manner In this paper we describe an implementation of a bu er tree We describe benchmarking tests which lead to an experimental determination of certain parameter values di erent from those originally suggested in the design of the data structure We describe implementations of two algorithms based on the bu er tree an external memory treesort and an external memory priority queue Our initial experiments with bu er tree sort for large problem sizes indicate that this algorithm easily outperforms similar algorithms based on internal memory techniques With some tuning of the bu er tree parameters we are able to obtain performance consistent with theoretical predictions for the range of problem sizes tested We include comparisons with TPIE Merge Sort We conclude that a the bu er tree as a generic data structure appears to perform well in theory and practice and b measuring I O e ciency experimentally is an important topic that merits further attention

[1]  Robert W. Floyd,et al.  Permuting Information in Idealized Two-Level Storage , 1972, Complexity of Computer Computations.

[2]  Kurt Mehlhorn,et al.  A new data structure for representing sorted lists , 1980, Acta Informatica.

[3]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[4]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[5]  Michiel H. M. Smid,et al.  Dynamic data structures on multiple storage media , 1989 .

[6]  Jeffrey Scott Vitter,et al.  Large-scale sorting in parallel memories (extended abstract) , 1991, SPAA '91.

[7]  Jeffrey Scott Vitter,et al.  Deterministic distribution sort in shared and distributed memory multiprocessors , 1993, SPAA '93.

[8]  Thomas H. Cormen,et al.  Virtual memory for data-parallel computing , 1993 .

[9]  Jyh-Jong Tsay,et al.  External-memory computational geometry , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[10]  Jeffrey Scott Vitter,et al.  Large-Scale Sorting in Uniform Memory Hierarchies , 1993, J. Parallel Distributed Comput..

[11]  D. E. Vengro A transparent parallel I/O environment , 1994 .

[12]  D. E. Vengroff Tpie user manual and reference , 1995 .

[13]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[14]  Yi-Jen Chiang,et al.  Dynamic and i/o-efficient algorithms for computational geometry and graph problems: theoretical and experimental results , 1995 .

[15]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[16]  Yi-Jen Chiang,et al.  Experiments on the Practical I/O Efficiency of Geometric Algorithms: Distribution Sweep vs. Plane Sweep , 1995, WADS.

[17]  Edward F. Grove,et al.  Simple randomized mergesort on parallel disks , 1996, SPAA '96.

[18]  Lars Arge,et al.  The Buuer Tree: a New Technique for Optimal I/o-algorithms ? , 1995 .

[19]  Garth A. Gibson Report of the Working Group on Storage I/O Issues in Large-Scale Computing , 1996 .

[20]  Frank Dehne,et al.  Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms , 1997, SPAA '97.

[21]  Thomas H. Cormen,et al.  Early Experiences in Evaluating the Parallel Disk Model with the ViC* Implementation , 1996, Parallel Comput..

[22]  Michael Kaufmann,et al.  BSP-Like External-Memory Computation , 1997, CIAC.