Scaling Machine Learning via Compressed Linear Algebra

Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/Obound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable very fast matrix-vector operations on in-memory data. Generalpurpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Compressed linear algebra (CLA) avoids these problems by applying lightweight lossless database compression techniques to matrices and then executing linear algebra operations such as matrix-vector multiplication directly on the compressed representations. The key ingredients are effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Experiments on an initial implementation in SystemML show in-memory operations performance close to the uncompressed case and good compression ratios.We thereby obtain significant end-to-end performance improvements up to 26x or reduced memory requirements.

[1]  Peter J. Haas,et al.  Compressed Linear Algebra for Large-Scale Machine Learning , 2016, Proc. VLDB Endow..

[2]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[3]  P. Haas,et al.  Estimating the Number of Classes in a Finite Population , 1998 .

[4]  Shirish Tatikonda,et al.  On optimizing machine learning workloads via kernel fusion , 2015, PPoPP.

[5]  Shirish Tatikonda,et al.  SystemML: Declarative Machine Learning on Spark , 2016, Proc. VLDB Endow..

[6]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[7]  Nectarios Koziris,et al.  Optimizing sparse matrix-vector multiplication using index and value compression , 2008, CF '08.

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Bin Cui,et al.  Exploiting Matrix Dependency for Efficient Distributed Matrix Computation , 2015, SIGMOD Conference.

[11]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[12]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[13]  Shirish Tatikonda,et al.  Resource Elasticity for Large-Scale Machine Learning , 2015, SIGMOD Conference.

[14]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[15]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[16]  Shivnath Babu,et al.  Cumulon: optimizing statistical data analysis in the cloud , 2013, SIGMOD '13.

[17]  Rong Jin,et al.  Approximate kernel k-means: solution to large scale kernel clustering , 2011, KDD.

[18]  Michael Stonebraker,et al.  The Architecture of SciDB , 2011, SSDBM.

[19]  Berthold Reinwald,et al.  Declarative Machine Learning - A Classification of Basic Properties and Types , 2016, ArXiv.

[20]  Peter J. Haas,et al.  Ricardo: integrating R and Hadoop , 2010, SIGMOD Conference.