Database Operations in D4M.j1

Each step in the data analytics pipeline is important, including database ingest and query. The D4M-Accumulo database connector has allowed analysts to quickly and easily ingest to and query from Apache Accumulo using MATLAB®/GNU Octave syntax. D4M.jl, a Julia implementation of D4M, provides much of the functionality of the original D4M implementation to the Julia community. In this work, we extend D4M.jl to include many of the same database capabilities that the MATLAB®/GNU Octave implementation provides. Here we will describe the D4M.jl database connector, demonstrate how it can be used, and show that it has comparable or better performance to the original implementation in MATLAB®/GNU Octave.

[1]  Jeremy Kepner,et al.  Dynamic distributed dimensional data model (D4M) database and computation system , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jeremy Kepner,et al.  D4M 2.0 schema: A general purpose high performance schema for the Accumulo database , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Yelena Yesha,et al.  A database-based distributed computation architecture with Accumulo and D4M: An application of eigensolver for large sparse matrix , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[4]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[5]  Jeremy Kepner,et al.  pMATLAB: Parallel MATLAB Library for Signal Processing Applications , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Alan Edelman,et al.  Julia implementation of the Dynamic Distributed Dimensional Data Model , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  Jeremy Kepner,et al.  Using a Power Law distribution to describe big data , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[8]  Jeremy Kepner,et al.  LLSuperCloud: Sharing HPC systems for diverse rapid prototyping , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  Jeremy Kepner,et al.  From NoSQL Accumulo to NewSQL Graphulo: Design and utility of graph algorithms inside a BigTable database , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[10]  Arkady Yerukhimovich,et al.  Computing on Masked Data to improve the security of big data , 2015, 2015 IEEE International Symposium on Technologies for Homeland Security (HST).

[11]  Jeremy Kepner,et al.  Enabling on-demand database computing with MIT SuperCloud database management system , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[12]  Jeremy Kepner,et al.  Achieving 100,000,000 database inserts per second using Accumulo and D4M , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[13]  Ranjan Sen,et al.  Benchmarking Apache Accumulo BigData Distributed Table Store Using Its Continuous Test Suite , 2013, 2013 IEEE International Congress on Big Data.

[14]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[15]  Xi Zhang,et al.  A Big Data Framework for Cloud Monitoring , 2016, 2016 IEEE/ACM 2nd International Workshop on Big Data Software Engineering (BIGDSE).

[16]  Alan Edelman,et al.  Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[17]  Mudhakar Srivatsa,et al.  Efficient spatial query processing for big data , 2014, SIGSPATIAL/GIS.