Bringing Linear Algebra Objects to Life in a Column-Oriented In-Memory Database

Large numeric matrices and multidimensional data arrays appear in many science domains, as well as in applications of financial and business warehousing. Common applications include eigenvalue determination of large matrices, which decompose into a set of linear algebra operations. With the rise of in-memory databases it is now feasible to execute these complex analytical queries directly in a relational database system without the need of transfering data out of the system and being restricted by hard disc latencies for random accesses. In this paper, we present a way to integrate linear algebra operations and large matrices as first class citizens into an in-memory database following a two-layered architectural model. The architecture consists of a logical component receiving manipulation statements and linear algebra expressions, and of a physical layer, which autonomously administrates multiple matrix storage representations. A cost-based hybrid storage representation is presented and an experimental implementation is evaluated for matrix-vector multiplications.

[1]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[2]  Raphael Yuster,et al.  Fast sparse matrix multiplication , 2004, TALG.

[3]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[4]  Athman Bouguettaya,et al.  Implementation and Experiments , 2011 .

[5]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[6]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[7]  Rasmus Pagh,et al.  Better Size Estimation for Sparse Matrix Products , 2010, Algorithmica.

[8]  Robert Roth,et al.  Importance truncation for large-scale configuration interaction approaches , 2009, 0903.4605.

[9]  Michael Stonebraker,et al.  Intel "big data" science and technology center vision and execution plan , 2013, SGMD.

[10]  John R. Gilbert,et al.  Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..

[11]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[12]  Ying Zhang,et al.  SciQL, a query language for science applications , 2011, AD '11.

[13]  Rob H. Bisseling,et al.  Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods , 2009, SIAM J. Sci. Comput..

[14]  Daniel A. Keim,et al.  A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[15]  Anthony Skjellum,et al.  A framework for high‐performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low‐level kernels , 2002, Concurr. Comput. Pract. Exp..

[16]  Rasmus Pagh,et al.  Faster join-projects and sparse matrix multiplications , 2009, ICDT '09.

[17]  Masha Sosonkina,et al.  Ab initio nuclear structure – the large sparse matrix eigenvalue problem , 2009, 0907.0209.

[18]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.