论文信息 - I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors

I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors

The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for private-cache multi-core architectures. As a tool for developing geometric algorithms in this model, a parallel version of the I/O-efficient distribution sweeping framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework. The obtained algorithms were efficient but not optimal. In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of $O(sort {P}(N) + K/PB)$ for a number of problems on axis-aligned objects, $P$ denotes the number of cores/processors, $B$ denotes the number of elements that fit in a cache line, $N$ and $K$ denote the sizes of the input and output, respectively, and $sort {P}(N)$ denotes the I/O complexity of sorting $N$ items using $P$ processors in the PEM model. To obtain the above improvement, we present a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of $O((N + K)/PB)$, where $K$ is the sum of the counts of all the ranges. The key to achieving efficient load balancing among the processors in this algorithm is a new method to count the output without enumerating it, which might be of independent interest.

Norbert Zeh | Nodari Sitchinava | Deepak Ajwani

[1] Guy E. Blelloch,et al. Low depth cache-oblivious algorithms , 2010, SPAA '10.

[2] Michael T. Goodrich,et al. Parallel external memory graph algorithms , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3] Guy E. Blelloch,et al. Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.

[4] Justin R. Rattner. Multi-Core to the Masses , 2005, IEEE PACT.

[5] Michael A. Bender,et al. Concurrent cache-oblivious b-trees , 2005, SPAA '05.

[6] Amitava Datta,et al. Efficient Parallel Algorithms for Geometric Partitioning Problems through Parallel Range Searching , 1994, 1994 International Conference on Parallel Processing Vol. 3.

[7] D. Geer,et al. Chip makers turn to multicore processors , 2005, Computer.

[8] Michael T. Goodrich,et al. Intersecting line segments in parallel with an output-sensitive number of processors , 1989, SPAA '89.

[9] Norbert Zeh,et al. Cache-Oblivious Red-Blue Line Segment Intersection , 2008, ESA.

[10] Geoff Lowney,et al. Why Intel is designing multi-core processors , 2006, SPAA '06.

[11] Vijaya Ramachandran,et al. The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation , 2007, SPAA '07.

[12] Vijaya Ramachandran,et al. Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[13] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.

[14] Michael T. Goodrich,et al. Fundamental parallel algorithms for private-cache chip multiprocessors , 2008, SPAA '08.

[15] Jyh-Jong Tsay,et al. External-memory computational geometry , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[16] Gerth Stølting Brodal,et al. Cache Oblivious Distribution Sweeping , 2002, ICALP.

[17] Vijaya Ramachandran,et al. Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.

[18] Michael T. Goodrich,et al. Parallel external memory model - a parallel model for multi-core architectures , 2009 .

[19] Norbert Zeh,et al. Geometric Algorithms for Private-Cache Chip Multiprocessors - (Extended Abstract) , 2010, ESA.