NANOS: Effective Integration of Fine-grain Parallelism Exploitation and Multiprogramming

The objective of the NANOS project is to investigate possible ways to accomplish both high system throughput and application performance for parallel applications in multiprogrammed environments on shared–memory multiprocessors. The target of the project has been the development of a complete environment in which interactions between mechanisms and policies at different levels (application, compiler, threads library and kernel) are carefully coordinated, in order to achieve the aforementioned goals. The environment integrates techniques proposed in different research frameworks, enabling the exploitation of their combined potential and the development of new algorithms and ideas. The NANOS environment includes 1) an application development environment consisting of an application structure visualization and transformation tool; 2) an extended OpenMP parallelizing compiler; 3) a runtime user–level threads library; 4) an application performance visualization and analysis tool; 5) a processor manager; and 6) a system activity visualization tool. NANOS focuses on numerical applications written in Fortran77 and targets a current state of the art computer, the SGI Origin2000.

[1]  Eduard Ayguadé,et al.  Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors , 1999, ICS '99.

[2]  Eduard Ayguadé,et al.  Analysis of several scheduling algorithms under the nano-threads programming model , 1997, Proceedings 11th International Parallel Processing Symposium.

[3]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[4]  Anoop Gupta,et al.  Data locality and load balancing in COOL , 1993, PPOPP '93.

[5]  Ian Foster,et al.  A compilation system that integrates High Performance Fortran and Fortran M , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[6]  Thomas R. Gross,et al.  Task Parallelism in a High Performance Fortran Framework , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[7]  Maurizio Giordano,et al.  A graphic parallelizing environment for user-compiler interaction , 1999, ICS '99.

[8]  Gregory R. Andrews,et al.  Filaments: Efficient Support for Fine-Grain Parallelism , 1993 .

[9]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[10]  Jose E. Moreira,et al.  On the implementation and effectiveness of autoscheduling for shared-memory multiprocessors , 1996 .

[11]  Robert Sims,et al.  Alpha architecture reference manual , 1992 .

[12]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[13]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[14]  Alok N. Choudhary,et al.  Double Standards: Bringing Task Parallelism to HPF Via the Message Passing Interface , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[15]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.

[16]  Eleftherios D. Polychronopoulos,et al.  Efficient Runtime Thread Management for the Nano-Threads Programming Model , 1998, IPPS/SPDP Workshops.

[17]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[18]  Eduard Ayguadé,et al.  A Library Implementation of the Nano-Threads Programming Model , 1996, Euro-Par, Vol. II.

[19]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[20]  Milind Girkar,et al.  Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..

[21]  David R. Keppel,et al.  Tools and Techniques for Building Fast Portable Threads Packages , 1993 .

[22]  Prithviraj Banerjee,et al.  Simultaneous exploitation of task and data parallelism in regular scientific applications , 1996 .

[23]  Eleftherios D. Polychronopoulos,et al.  An Efficient Kernel-Level Scheduling Methodology for Multiprogrammed Shared Memory Multiprocessors , 1999 .

[24]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[25]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[26]  Milind Girkar,et al.  Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[27]  John H. Edmondson,et al.  Superscalar instruction execution in the 21164 Alpha microprocessor , 1995, IEEE Micro.

[28]  Constantine D. Polychronopoulos Multiprocessing versus Multiprogramming , 1989, ICPP.

[29]  Eduard Ayguadé,et al.  Exploiting Parallelism Through Directives on the Nano-Threads Programming Model , 1997, LCPC.

[30]  Carl J. Beckmann,et al.  Hardware and software for functional and fine grain parallelism , 1993 .

[31]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[32]  Evangelos P. Markatos,et al.  Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.

[33]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[34]  Xavier Martorell,et al.  Nano-Threads Library Design, Implementation and Evaluation , 1995 .

[35]  Eleftherios D. Polychronopoulos,et al.  Kernel-level scheduling for the nano-threads programming model , 1998, ICS '98.

[36]  Andrew Gilliam Tucker,et al.  Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors , 1994 .