Prospects for Functional Address Translation

Address translation fundamentally embodies a translation function that maps from virtual to physical addresses. In current systems, the translation function is encoded by the kernel in an in-memory radix tree structure (the page table hierarchy) which is then interpreted by the hardware (the pagewalker, pagewalk-caches, and TLBs). We consider implementing the translation function itself as reconfigurable hardware—does this make any sense? To study this question, we collected numerous in-situ Linux page tables for a wide range of workloads, including those from HPC, to serve as example translation functions. We then prototyped several potential mechanisms to implement the translation function, including inverted page tables with function-specific perfect hashing, translation functions directly implemented using Espresso-minimized PLAs, translation functions genetically-evolved in a language suitable for FPGA-like synthesis, and translation functions based on recovered/manufactured region (segment/mmap) lookup using multiplexor trees. Each mechanism was then evaluated using the Linux page tables, primarily for space and lookup speed. We report our findings and try to address the question.

[1]  Brian Kocoloski,et al.  HPMMAP: Lightweight Memory Management for Commodity Operating Systems , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[2]  Per Stenström,et al.  TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors , 2002, ISLPED '02.

[3]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[4]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[5]  Peter A. Dinda,et al.  A Case for Transforming Parallel Runtimes Into Operating System Kernels , 2015, HPDC.

[6]  Abraham Silberschatz,et al.  Operatlng system concepts - alternate edition , 1988 .

[7]  Aamer Jaleel,et al.  CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  Bart Preneel,et al.  Hash functions , 2005, Encyclopedia of Cryptography and Security.

[9]  Mark D. Hill,et al.  A new page table for 64-bit address spaces , 1995, SOSP.

[10]  Norman P. Jouppi,et al.  A simulation based study of TLB performance , 1992, ISCA '92.

[11]  Peter A. Dinda,et al.  Enhancing virtualized application performance through dynamic adaptive paging mode selection , 2011, ICAC '11.

[12]  Michael M. Swift,et al.  Reducing memory reference energy with opportunistic virtual caching , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[13]  Jason Cong,et al.  A quantitative analysis on microarchitectures of modern CPU-FPGA platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Peter A. Dinda,et al.  Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization , 2017, 2017 IEEE International Conference on Autonomic Computing (ICAC).

[15]  Sandia Report,et al.  HPCG Technical Specification , 2013 .

[16]  Margaret Martonosi,et al.  Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[17]  Alberto L. Sangiovanni-Vincentelli,et al.  Multiple-Valued Minimization for PLA Optimization , 1987, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Osman S. Unsal,et al.  Range Translations for Fast Virtual Memory , 2016, IEEE Micro.

[19]  Anand Sivasubramaniam,et al.  Generating physical addresses directly for saving instruction TLB energy , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[20]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .

[21]  Douglas C. Schmidt,et al.  GPERF: A Perfect Hash Function Generator , 1990, C++ Conference.

[22]  Peter A. Dinda,et al.  A Case for Alternative Nested Paging Models for Virtualized Systems , 2010, IEEE Computer Architecture Letters.

[23]  Srilatha Manne,et al.  Accelerating two-dimensional page walks for virtualized systems , 2008, ASPLOS.

[24]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[25]  Mahmut T. Kandemir,et al.  Compiler-directed physical address generation for reducing dTLB power , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[26]  Abhishek Bhattacharjee,et al.  Large-reach memory management unit caches , 2013, MICRO.

[27]  Jerry Huck,et al.  Architectural support for translation table management in large address space machines , 1993, ISCA '93.

[28]  Michael M. Swift,et al.  Efficient virtual memory for big memory servers , 2013, ISCA.

[29]  Martin Dietzfelbinger,et al.  Hash, Displace, and Compress , 2009, ESA.

[30]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[31]  Peter A. Dinda,et al.  Dark Shadows: User-Level Guest/Host Linux Process Shadowing , 2017, 2017 IEEE International Conference on Cloud Engineering (IC2E).

[32]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Dan Tsafrir,et al.  Hash, Don't Cache (the Page Table) , 2016, SIGMETRICS.

[34]  Peter A. Dinda,et al.  Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support , 2016, VEE.

[35]  Alan L. Cox,et al.  SpecTLB: A mechanism for speculative address translation , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[36]  Trevor N. Mudge,et al.  A look at several memory management units, TLB-refill mechanisms, and page table organizations , 1998, ASPLOS VIII.

[37]  George Havas,et al.  An Optimal Algorithm for Generating Minimal Perfect Hash Functions , 1992, Inf. Process. Lett..

[38]  Abhishek Bhattacharjee,et al.  Efficient Address Translation for Architectures with Multiple Page Sizes , 2017, ASPLOS.

[39]  Friedhelm Meyer auf der Heide,et al.  Dynamic perfect hashing: upper and lower bounds , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[40]  Michael Frumkin,et al.  The OpenMP Implementation of NAS Parallel Benchmarks and its Performance , 2013 .

[41]  Alan L. Cox,et al.  Translation caching: skip, don't walk (the page table) , 2010, ISCA.

[42]  Tianhao Zhang,et al.  Do-it-yourself virtual memory translation , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[43]  H. Jin,et al.  - 3-The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance , 1999 .

[44]  Gabriel H. Loh,et al.  Increasing TLB reach by exploiting clustering in page translations , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[45]  Guang R. Gao,et al.  An energy efficient TLB design methodology , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[46]  Collin McCurdy,et al.  Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[47]  Alan B. Williams,et al.  Poster: mini-applications: vehicles for co-design , 2011, SC '11 Companion.

[48]  Alexander Aiken,et al.  Language support for dynamic, hierarchical data partitioning , 2013, OOPSLA.

[49]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .