Jumbler: A lock-contention aware thread scheduler for multi-core parallel machines

On a cache-coherent multi-core multi-processor parallel machine, the execution time of a multi-threaded application with high-lock contention is immensely sensitive to the distribution of application-threads across multiple processors. Improper mapping of threads results in loss of performance due to the frequency of lock transfers between sockets. With increased transfer of lock object among different processors, a large number of last-level cache misses occur. The increase in last-level cache misses negatively affects program execution. Operating system's thread-schedulers are unaware of lock contention and therefore the default execution results in loss of performance especially in the application employing high lock-contention. To mitigate the problem, we propose a novel-scheduling technique as an extension of an existing work called shuffling. Our proposed scheduler migrates and maps the threads of a multi-threaded application across sockets so that the lock-contention threads are mapped on the same socket. The threads mapped together (employing the same lock) yield low number of last-level cache misses. We experiment with the proposed scheduler on a system having 2 sockets with 4 cores each and evaluate it using multithreaded parallel benchmarks. The experiments show that our algorithm achieves reduction in execution time up to 986.7%. Moreover, our algorithm does not require any changes to the application source-code or the operating system kernel.

[1]  Radu Prodan,et al.  The JavaSymphony Extensions for Parallel GPU Computing , 2012, 2012 41st International Conference on Parallel Processing.

[2]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Bradford Nichols,et al.  Pthreads programming - a POSIX standard for better multiprocessing , 1996 .

[4]  Julia L. Lawall,et al.  Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications , 2012, USENIX Annual Technical Conference.

[5]  Mark G. Sobell A Practical Guide to Linux Commands, Editors, and Shell Programming , 1984 .

[6]  Laxmi N. Bhuyan,et al.  ADAPT: A framework for coscheduling multithreaded programs , 2013, TACO.

[7]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Laxmi N. Bhuyan,et al.  Lock contention aware thread migrations , 2014, PPoPP '14.

[9]  Surendar Chandra,et al.  Thread Migration to Improve Synchronization Performance , 2006 .

[10]  Witawas Srisa-an,et al.  Contention-aware scheduler: unlocking execution parallelism in multithreaded java programs , 2008, OOPSLA.

[11]  R. Schaller,et al.  Moore's law: past, present and future , 1997 .

[12]  Vivien Quéma,et al.  The Linux scheduler: a decade of wasted cores , 2016, EuroSys.

[13]  Laxmi N. Bhuyan,et al.  Shuffling: A framework for lock contention aware thread scheduling for multicore multiprocessor systems , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).