Contribution à l'élaboration d'ordonnanceurs de processus légers performants et portables pour architectures multiprocesseurs. (Contribution to the design of portable and efficient threads schedulers for multiprocessors architectures)

Nowadays, threads are widely spread in computer science. Indeed, multithreading allows applications not only to fully exploit multiprocessor computers, but also to reveal its intrinsic parallelism. In the context of high performance computing, threads are commonly used to overlap computation with communication. They also allow various execution flows within the application to progress independently one from another. This is a mandatory functionality regarding the implementation of complex middleware such as MPI or CORBA. My work aims at providing an efficient threads library targeting a wide range of architectures (monoor multiprocessor computers, SMT technology, etc.) and able to fulfil the requirements of high performance computing programs. First, I have extended and implemented the Scheduler Activation model within the Linux kernel, so that user threads can be extremely reactive to hardware interruptions. Then, I did expend this mechanism to unify the management of interrupts and polling in multithreaded environments. Finally, I have designed new tracing mechanisms allowing to precisely rebuild the execution of multithreaded programs, even with a two-level scheduling. All these works have been implemented within the PM2 software suite. The Marcel library provides efficient multithreading on a large range of processors and systems. Marcel is flexible enough to allow an application to precisely manage its threads scheduling when needed. Applications can be traced in order to observe their precise behavior. The generated traces can be converted to the Pajé software format, so that application behavior can be graphically observed.

[1]  Vincent Danjean,et al.  Mécanismes de traces efficaces pour programmes multithreadés , 2005, Tech. Sci. Informatiques.

[2]  Nathan J. Williams,et al.  Proceedings of the Freenix Track: 2002 Usenix Annual Technical Conference an Implementation of Scheduler Activations on the Netbsd Operating System , 2022 .

[3]  Jacques Chassin de Kergommeaux,et al.  Pajé: An Extensible Environment for Visualizing Multi-threaded Programs Executions , 2000, Euro-Par.

[4]  Christian Pérez,et al.  Towards High Performance CORBA and MPI Middlewares for Grid Computing , 2001, GRID.

[5]  Jerome H. Saltzer,et al.  Traffic control in a multiplexed computer system , 1966 .

[6]  Ilan Ginzburg Athapascan-0b : intégration efficace et portable de multiprogrammation légère et de communications. (Athapascan-0b: efficient and portable integration of communications and multithreading) , 1997 .

[7]  Pascal Hénon,et al.  PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions , 2000, IPDPS Workshops.

[8]  Rudolf Berrendorf,et al.  PCL - The Performance Counter Library: A Common Interface to Access Hardware Performance Counters on Microprocessors , 1998 .

[9]  Theo Ungerer,et al.  A survey of processors with explicit multithreading , 2003, CSUR.

[10]  K. Langendoen,et al.  Integrating polling, interrupts, and thread management , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[11]  Allen D. Malony,et al.  Overhead Compensation in Performance Profiling , 2004, Parallel Process. Lett..

[12]  Frédéric Suter,et al.  A Scalable Approach to Network Enabled Servers , 2002, ASIAN.

[13]  Frank Mueller,et al.  A Library Implementation of POSIX Threads under UNIX , 1993, USENIX Winter.

[14]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[15]  Gilles Muller,et al.  Language Design for Implementing Process Scheduling Hierarchies , 2004 .

[16]  Jack J. Dongarra,et al.  The PVM Concurrent Computing System: Evolution, Experiences, and Trends , 1994, Parallel Comput..

[17]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[18]  Luc Bougé,et al.  Improving Reactivity to I/O Events in Multithreaded Environments Using a Uniform, Scheduler-Centric API , 2002, Euro-Par.

[19]  Guillaume Mercier,et al.  MPICH/MADIII : a cluster of clusters enabled MPI implementation , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[20]  Michael M. Resch,et al.  An Extension to MPI for Distributed Computing on MPPs , 1997, PVM/MPI.

[21]  Michael M. Resch,et al.  Distributed Computing in a Heterogeneous Computing Environment , 1998, PVM/MPI.

[22]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[23]  Vincent Danjean Extending the Linux kernel with activations for better support of multithreaded programs and inte , 1999 .

[24]  Robbert van Renesse,et al.  Using Sparse Capabilities in a Distributed Operating System , 1986, ICDCS.

[25]  Felix Wolf,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications , 2003 .

[26]  Jacques Briat,et al.  Athapascan Runtime: Efficiency for Irregular Problems , 1997, Euro-Par.

[27]  Allen D. Malony,et al.  Portable profiling and tracing for parallel, scientific applications using C++ , 1998, SPDT '98.

[28]  Barton P. Miller,et al.  Using Dynamic Kernel Instrumentation for Kernel and Application Tuning , 1999, Int. J. High Perform. Comput. Appl..

[29]  Bernd Mohr,et al.  Automatic performance analysis of hybrid MPI/OpenMP applications , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[30]  Samuel Thibault,et al.  Developing a Software Tool for Precise Kernel Measurements , 2005 .

[31]  Roy Friedman,et al.  MILLIPEDE: Easy Parallel Programming in Available Distributed Environments , 1997, Softw. Pract. Exp..

[32]  François Galilée Athapascan-1 : interprétation distribuée du flot de données d'un programme parallèle. (Athapascan-1 : distributed interpretation of parallel programs based on data flow analysis) , 1999 .

[33]  Siegfried Benkner,et al.  Efficient parallel programming on scalable shared memory systems with High Performance Fortran , 2002, Concurr. Comput. Pract. Exp..

[34]  Edsger W. Dijkstra,et al.  Cooperating sequential processes , 2002 .

[35]  T. Kielmann,et al.  Enabling Java for High-Performance Computing : Exploiting Distributed Shared Memory and Remote Method Invocation , 2001 .

[36]  Allen D. Malony,et al.  The role of instrumentation and mapping in performance measurement , 2001 .

[37]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[38]  Jean-François Méhaut,et al.  PM2: Parallel Multithreaded Machine. A Computing Environment for Distributed Architectures , 1995, PARCO.

[39]  Ralf S. Engelschall Portable Multithreading-The Signal Stack Trick for User-Space Thread Creation , 2000, USENIX Annual Technical Conference, General Track.

[40]  Philip J. Hatcher,et al.  The Hyperion system: Compiling multithreaded Java bytecode for distributed execution , 2001, Parallel Comput..

[41]  Philip J. Hatcher,et al.  Implementing Java Consistency Using a Generic, Multithreaded DSM Runtime System , 2000, IPDPS Workshops.

[42]  Paul Rovner Extending Modula-2 to Build Large, Integrated Systems , 1986, IEEE Software.

[43]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[44]  Matthew Haines,et al.  On the design of Chant: a talking threads package , 1994, Proceedings of Supercomputing '94.

[45]  Samuel Thibault Un ordonnanceur flexible pour machines multiprocesseurs hierarchiques , 2005 .

[46]  Robert D. Russell,et al.  Fast Kernel Tracing: A Performance Evaluation Tool For Linux , 2001 .

[47]  Michel Dagenais,et al.  Measuring and Characterizing System Behavior Using Kernel-Level Event Logging , 2000, USENIX Annual Technical Conference, General Track.

[48]  Niklaus Wirth,et al.  Modula: A language for modular multiprogramming , 1977, Softw. Pract. Exp..

[49]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[50]  Kirk L. Johnson,et al.  High-performance all-software distributed shared memory , 1996 .

[51]  Guillaume Mercier,et al.  MPICH/Madeleine: a true multi-protocol MPI for high performance networks , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[52]  Mathias Doreille,et al.  Athapascan-1 : vers un modèle de programmation parallèle adapté au calcul scientifique. (Athapascan-1 : towards a parallel programming model adapted to the scientific computation) , 1999 .

[53]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[54]  Christian Pérez,et al.  PadicoTM: An Open Integration Framework for Communication Middleware and Runtimes , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[55]  David R. Cheriton,et al.  The Thoth System , 1982 .

[56]  Chorus Systemes,et al.  Overview of the CHORUS? Distributed Operating Systems , 1991 .

[57]  George C. Necula,et al.  Capriccio: scalable threads for internet services , 2003, SOSP '03.

[58]  Catherine Roucairol,et al.  BOB : a Unified Platform for Implementing Branch-and-Bound like Algorithms , 1995 .

[59]  Gerson G. H. Cavalheiro Athapascan-1 : interface générique pour l'ordonnancement dans un environnement d'exécution parallèle. (Athapascan-1 : generic scheduling interface in a parallel execution environment) , 1999 .

[60]  Butler W. Lampson,et al.  A user machine in a time-sharing system , 1966 .