I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors

The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for private-cache multi-core architectures. As a tool for developing geometric algorithms in this model, a parallel version of the I/O-efficient distribution sweeping framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework. The obtained algorithms were efficient but not optimal. In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of $O(sort {P}(N) + K/PB)$ for a number of problems on axis-aligned objects, $P$ denotes the number of cores/processors, $B$ denotes the number of elements that fit in a cache line, $N$ and $K$ denote the sizes of the input and output, respectively, and $sort {P}(N)$ denotes the I/O complexity of sorting $N$ items using $P$ processors in the PEM model. To obtain the above improvement, we present a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of $O((N + K)/PB)$, where $K$ is the sum of the counts of all the ranges. The key to achieving efficient load balancing among the processors in this algorithm is a new method to count the output without enumerating it, which might be of independent interest.

[1]  Guy E. Blelloch,et al.  Low depth cache-oblivious algorithms , 2010, SPAA '10.

[2]  Michael T. Goodrich,et al.  Parallel external memory graph algorithms , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3]  Guy E. Blelloch,et al.  Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.

[4]  Justin R. Rattner Multi-Core to the Masses , 2005, IEEE PACT.

[5]  Michael A. Bender,et al.  Concurrent cache-oblivious b-trees , 2005, SPAA '05.

[6]  Amitava Datta,et al.  Efficient Parallel Algorithms for Geometric Partitioning Problems through Parallel Range Searching , 1994, 1994 International Conference on Parallel Processing Vol. 3.

[7]  D. Geer,et al.  Chip makers turn to multicore processors , 2005, Computer.

[8]  Michael T. Goodrich,et al.  Intersecting line segments in parallel with an output-sensitive number of processors , 1989, SPAA '89.

[9]  Norbert Zeh,et al.  Cache-Oblivious Red-Blue Line Segment Intersection , 2008, ESA.

[10]  Geoff Lowney,et al.  Why Intel is designing multi-core processors , 2006, SPAA '06.

[11]  Vijaya Ramachandran,et al.  The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation , 2007, SPAA '07.

[12]  Vijaya Ramachandran,et al.  Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[13]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[14]  Michael T. Goodrich,et al.  Fundamental parallel algorithms for private-cache chip multiprocessors , 2008, SPAA '08.

[15]  Jyh-Jong Tsay,et al.  External-memory computational geometry , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[16]  Gerth Stølting Brodal,et al.  Cache Oblivious Distribution Sweeping , 2002, ICALP.

[17]  Vijaya Ramachandran,et al.  Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.

[18]  Michael T. Goodrich,et al.  Parallel external memory model - a parallel model for multi-core architectures , 2009 .

[19]  Norbert Zeh,et al.  Geometric Algorithms for Private-Cache Chip Multiprocessors - (Extended Abstract) , 2010, ESA.