Lara: A Key-Value Algebra underlying Arrays and Relations

Data processing systems roughly group into families such as relational, array, graph, and key-value. Many data processing tasks exceed the capabilities of any one family, require data stored across families, or run faster when partitioned onto multiple families. Discovering ways to execute computation among multiple available systems, let alone discovering an optimal execution plan, is challenging given semantic differences between disparate families of systems. In this paper we introduce a new algebra, Lara, which underlies and unifies algebras representing the families above in order to facilitate translation between systems. We describe the operations and objects of Lara---union, join, and ext on associative tables---and show her properties and equivalences to other algebras. Multi-system optimization has a bright future, in which we proffer Lara for the role of universal connector.

[1]  Jeremy Kepner,et al.  Dynamic distributed dimensional data model (D4M) database and computation system , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jeremy Kepner,et al.  Graphulo implementation of server-side sparse matrix multiply in the Accumulo database , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Michael Stonebraker,et al.  A Demonstration of the BigDAWG Polystore System , 2015, Proc. VLDB Endow..

[4]  Jeremy Kepner,et al.  Associative Arrays: Unified Mathematics for Spreadsheets, Databases, Matrices, and Graphs , 2015, ArXiv.

[5]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[6]  James Demmel,et al.  ScaLAPACK: A Linear Algebra Library for Message-Passing Computers , 1997, PPSC.

[7]  Vadim Tropashko,et al.  First Steps in Relational Lattice , 2006, ArXiv.

[8]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[9]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[10]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[11]  Paul W. P. J. Grefen,et al.  A multi-set extended relational algebra: a formal approach to a practical issue , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[12]  Michael Stonebraker,et al.  Document processing in a relational database system , 1983, TOIS.

[13]  Michael Stonebraker,et al.  SciDB: A Database Management System for Applications with Complex Analytics , 2013, Computing in Science & Engineering.

[14]  Michael Stonebraker,et al.  Standards for graph algorithm primitives , 2014, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[15]  Limsoon Wong,et al.  Principles of Programming with Complex Objects and Collection Types , 1995, Theor. Comput. Sci..

[16]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[17]  Jimmy J. Lin,et al.  Summingbird: A Framework for Integrating Batch and Online MapReduce Computations , 2014, Proc. VLDB Endow..