A trace-driven methodology to evaluate and optimize memory management services of distributed operating systems for lightweight manycores

Lightweight manycores belong to a new class of emerging low-power processors for the Exascale era. These processors present several challenges for the development of applications, such as distributed memory architecture, limited amount of on-chip memory and no cache coherence. Recently, distributed Operating Systems (OSs) have been proposed to address these challenges in a transparent way. In these systems, different OS services are deployed across the processor cores, being the memory management service one of the most important ones. However, the intrinsic characteristics and memory limitations of lightweight manycores bring several challenges to the design, implementation and future optimizations of memory management services. In this work, we propose a trace-driven methodology to evaluate and optimize features of a memory management service of distributed OSs for lightweight manycores. By using a compact representation of the page access pattern of the applications, our methodology is capable of mimicking the memory access pattern of the original applications on the target distributed OS running on a lightweight manycore. We integrated our methodology in a distributed OS (Nanvix) and validated it using three applications from a specific benchmark for lightweight manycores (CAP Bench). Then, we applied our methodology to carry out a case study using a software-managed cache implementation available in Nanvix. Our methodology enabled us to evaluate different page replacement policies on Kalray MPPA-256, even without the required support from the architecture to implement them.

[1]  Rami G. Melhem,et al.  Scalable Multi-cache Simulation Using GPUs , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[2]  Gerhard W. Dueck,et al.  Trace Files for Automatic Memory Management Systems , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[3]  Philippe Olivier Alexandre Navaux,et al.  Communication-aware process and thread mapping using online communication detection , 2015, Parallel Comput..

[4]  Theo Ungerer,et al.  An Operating System for Safety-Critical Applications on Manycore Processors , 2014, 2014 IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing.

[5]  Takayuki Itoh,et al.  A Heatmap-Based Time-Varying Multi-variate Data Visualization Unifying Numeric and Categorical Variables , 2014, 2014 18th International Conference on Information Visualisation.

[6]  Benoît Dupont de Dinechin,et al.  A Distributed Run-Time Environment for the Kalray MPPA®-256 Integrated Manycore Processor , 2013, ICCS.

[7]  Jean-François Méhaut,et al.  On the Performance and Isolation of Asymmetric Microkernel Design for Lightweight Manycores , 2019, 2019 IX Brazilian Symposium on Computing Systems Engineering (SBESC).

[8]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[9]  Zain-ul-Abdin,et al.  Kickstarting high-performance energy-efficient manycore architectures with Epiphany , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[10]  Aamer Jaleel,et al.  Analyzing Parallel Programs with PIN , 2010, Computer.

[11]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[12]  Wei Ge,et al.  The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.

[13]  William J. Dally,et al.  Scaling the Power Wall: A Path to Exascale , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Benoît Dupont de Dinechin,et al.  A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[15]  Philippe Olivier Alexandre Navaux,et al.  CAP Bench: a benchmark suite for performance and energy evaluation of low‐power many‐core processors , 2017, Concurr. Comput. Pract. Exp..

[16]  Stefanos Kaxiras,et al.  Multicore Cache Simulations Using Heterogeneous Computing on General Purpose and Graphics Processors , 2011, 2011 14th Euromicro Conference on Digital System Design.

[17]  Frédéric Pétrot,et al.  Trace-driven exploration of sharing set management strategies for cache coherence in manycores , 2017, 2017 15th IEEE International New Circuits and Systems Conference (NEWCAS).

[18]  Bob Edwards,et al.  Programming the Adapteva Epiphany 64-core network-on-chip coprocessor , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[19]  Radu Marculescu,et al.  nOS: A nano-sized distributed operating system for many-core embedded systems , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[20]  Gerhard Fettweis,et al.  M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores , 2016, ASPLOS.