TabulaROSA: Tabular Operating System Architecture for Massively Parallel Heterogeneous Compute Engines

The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and tracks the resources of the compute engines and is akin to a database management system. To explore the idea of using a database in an operating system role, this work defines key operating system functions in terms of rigorous mathematical semantics (associative array algebra) that are directly translatable into database operations. These operations possess a number of mathematical properties that are ideal for parallel operating systems by guaranteeing correctness over a wide range of parallel operations. The resulting operating system equations provide a mathematical specification for a Tabular Operating System Architecture (TabulaROSA) that can be implemented on any platform. Simulations of forking in TabularROSA are performed using an associative array implementation and compared to Linux on a 32,000+ core supercomputer. Using over 262,000 forkers managing over 68,000,000,000 processes, the simulations show that TabulaROSA has the potential to perform operating system functions on a massively parallel scale. The TabulaROSA simulations show 20x higher performance as compared to Linux while managing 2000x more processes in fully searchable tables.

[1]  Malte Schwarzkopf,et al.  Operating system support for warehouse-scale computing , 2018 .

[2]  The Seven Tenets of Scalable Data Unification , 2017 .

[3]  Michael Stonebraker,et al.  A Demonstration of SciDB: A Science-Oriented DBMS , 2009, Proc. VLDB Endow..

[4]  Srivatsa S. Bhat,et al.  Designing multicore scalable filesystems with durability and crash consistency , 2017 .

[5]  Bradley A. Mason Tropical Algebra, Graph Theory, & Foreign Exchange Arbitrage , 2016 .

[6]  Linus Torvalds,et al.  Linux : a Portable Operating System , 2011 .

[7]  Binoy Ravindran,et al.  Popcorn: bridging the programmability gap in heterogeneous-ISA platforms , 2015, EuroSys.

[8]  Paul Klemperer,et al.  Understanding Preferences: 'Demand Types', and The Existence of Equilibrium with Indivisibilities , 2018 .

[9]  Jeremy Kepner,et al.  Scalable System Scheduling for HPC and Big Data , 2017, J. Parallel Distributed Comput..

[10]  Jack B. Dennis,et al.  A multiuser computation facility for education and research , 1964, CACM.

[11]  Lars George,et al.  HBase - The Definitive Guide: Random Access to Your Planet-Size Data , 2011 .

[12]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[13]  Michael Stonebraker,et al.  Readings in Database Systems , 1988 .

[14]  Jeremy Kepner,et al.  Polystore mathematics of relational algebra , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[15]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[16]  Dejan S. Milojicic,et al.  Rethinking operating systems for rebooted computing , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).

[17]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[18]  Thomas Sterling,et al.  High Performance Computing: Modern Systems and Practices , 2017 .

[19]  Gerhard Fettweis,et al.  M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores , 2016, ASPLOS.

[20]  David Elliot Shaw A Relational Database Machine Architecture , 1980, Computer Architecture for Non-Numeric Processing.

[21]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[22]  José E. Moreira,et al.  Enabling massive deep neural networks with the GraphBLAS , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[23]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[24]  Galen C. Hunt,et al.  Helios: heterogeneous multiprocessing with satellite kernels , 2009, SOSP '09.

[25]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[26]  Jeremy Kepner,et al.  Graphulo: Linear Algebra Graph Kernels for NoSQL Databases , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[27]  P. Klemperer The Product-Mix Auction: A New Auction Design for Differentiated Goods , 2010 .

[28]  Ken Thompson,et al.  Plan 9 from Bell Labs , 1995 .

[29]  Gordon Bell,et al.  A new architecture for mini-computers: the DEC PDP-11 , 1970, AFIPS '70 (Spring).

[30]  Anneli Folkesson Distributed Information Systems , 2017, Encyclopedia of GIS.

[31]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[32]  Jeremy Kepner,et al.  Dynamic distributed dimensional data model (D4M) database and computation system , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  James R. Larus,et al.  Language support for fast and reliable message-based communication in singularity OS , 2006, EuroSys.

[34]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[35]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[36]  Zhen Wang,et al.  K2 , 2015, False Summit.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  C. J. Date,et al.  A Guide to the SQL Standard: A User's Guide to the Standard Relational Language SQL , 1993 .

[39]  J. C. R. Licklider,et al.  A Time-Sharing Debugging System for a Small Computer , 1899 .

[40]  Austin T. Clements,et al.  The scalable commutativity rule: designing scalable software for multicore processors , 2013, SOSP.

[41]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[42]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[43]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[44]  Alan Edelman,et al.  Julia implementation of the Dynamic Distributed Dimensional Data Model , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[45]  A. Retrospective,et al.  The UNIX Time-sharing System , 1977 .

[46]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[47]  J. L. Mitchell,et al.  TX-0, a transistor computer with a 256 by 256 memory , 1956, AIEE-IRE '56 (Eastern).

[48]  Michael Wall,et al.  Accumulo: Application Development, Table Design, and Best Practices , 2015 .

[49]  Ivan E. Sutherland,et al.  On the design of display processors , 1968, Commun. ACM.

[50]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[51]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[52]  Wesley A. Clark The Lincoln TX-2 computer development , 1957, IRE-AIEE-ACM '57 (Western).

[53]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[54]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[55]  Ronald Minnich,et al.  NIX: A case for a manycore system for cloud computing , 2012, Bell Labs Technical Journal.

[56]  Jeremy Kepner,et al.  Graphulo implementation of server-side sparse matrix multiply in the Accumulo database , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[57]  John Shalf,et al.  Exascale Operating Systems and Runtime Software Report , 2012 .

[58]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[59]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[60]  Michael Stonebraker,et al.  The design and implementation of INGRES , 1976, TODS.

[61]  Jeremy Kepner,et al.  Novel graph processor architecture, prototype system, and results , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).