MCSTL: the multi-core standard template library

Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, most applications will soon have to support parallelism explicitly. The Multi-Core Standard Template Library (MCSTL) simplifies parallelization by providing efficient parallel implementations of the algorithms in the C++ Standard Template Library. Thus, simple recompilation will provide partial parallelization of applications that make consistent use of the STL. We present performance measurements on several architectures. For example, our sorter achieves a speedup of 21 on an 8-core 32-thread SUN T1.

[1]  Nancy M. Amato,et al.  STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.

[2]  Yi Zhang,et al.  A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000 , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[3]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[4]  David B. Lomet,et al.  AlphaSort: a RISC machine sort , 1994, SIGMOD '94.

[5]  Peter Sanders,et al.  : Standard Template Library for XXL Data Sets , 2005, ESA.

[6]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[7]  Peter Sanders Tree Shaped Computations as a Model for Parallel Applications , 1998 .

[8]  Sonal Kothari,et al.  Register Efficient Mergesorting , 2000, HiPC.

[9]  Peter Sanders,et al.  Random Permutations on Distributed, External and Hierarchical Memory , 1998, Inf. Process. Lett..

[10]  Peter Sanders,et al.  STXXL: standard template library for XXL data sets , 2008, Softw. Pract. Exp..

[11]  Didier Baertschiger Multi-Processing Template Library , 2006 .

[12]  Peter Sanders,et al.  MCSTL: The Multi-core Standard Template Library , 2007, Euro-Par.

[13]  Alexander A. Stepanov,et al.  C++ Standard Template Library , 2000 .

[14]  Peter J. Varman,et al.  Merging Multiple Lists on Hierarchical-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..

[15]  Nancy M. Amato,et al.  A framework for adaptive algorithm selection in STAPL , 2005, PPoPP.

[16]  Viktor K. Prasanna,et al.  High Performance Computing — HiPC 2000 , 2001, Lecture Notes in Computer Science.

[17]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[18]  David R. Butenhof Programming with POSIX threads , 1993 .

[19]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[20]  Peter Sanders,et al.  Randomized Receiver Initiated Load-balancing Algorithms for Tree-shaped Computations , 2002, Comput. J..

[21]  Udi Manber,et al.  DIB—a distributed implementation of backtracking , 1987, TOPL.

[22]  Peter Sanders,et al.  Fast priority queues for cached memory , 1999, JEAL.

[23]  Peter Sanders,et al.  Asynchronous parallel disk sorting , 2003, SPAA '03.

[24]  Stefano Leonardi,et al.  Algorithms - ESA 2005, 13th Annual European Symposium, Palma de Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, ESA.