Non-Volatile Memory Accelerated Geometric Multi-Scale Resolution Analysis

Dimensionality reduction algorithms are standard tools in a researcher’s toolbox. Dimensionality reduction algorithms are frequently used to augment downstream tasks such as machine learning, data science, and also are exploratory methods for understanding complex phenomena. For instance, dimensionality reduction is commonly used in Biology as well as Neuroscience to understand data collected from biological subjects. However, dimensionality reduction techniques are limited by the von-Neumann architectures that they execute on. Specifically, data intensive algorithms such as dimensionality reduction techniques often require fast, high capacity, persistent memory which historically hardware has been unable to provide at the same time. In this paper, we present a re-implementation of an existing dimensionality reduction technique called Geometric Multi-Scale Resolution Analysis (GMRA) which has been accelerated via novel persistent memory technology called Memory Centric Active Storage (MCAS). Our implementation uses a specialized version of MCAS called PyMM that provides native support for Python datatypes including NumPy arrays and PyTorch tensors. We compare our PyMM implementation against a DRAM implementation, and show that when data fits in DRAM, PyMM offers competitive runtimes. When data does not fit in DRAM, our PyMM implementation is still able to process the data.

[1]  Gerald Kowalski,et al.  Information Retrieval Systems: Theory and Implementation , 1997 .

[2]  Ioannis Mitliagkas,et al.  Memory Limited, Streaming PCA , 2013, NIPS.

[3]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[4]  Kenji Kita,et al.  Dimensionality reduction using non-negative matrix factorization for information retrieval , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[5]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[6]  Zaïd Harchaoui,et al.  Invariances and Data Augmentation for Supervised Music Transcription , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Sandeep Subramanian,et al.  Deep Complex Networks , 2017, ICLR.

[8]  Takehisa Yairi,et al.  Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction , 2014, MLSDA'14.

[9]  David P. Woodruff,et al.  Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..

[10]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[11]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[12]  Brian D. Hoskins,et al.  Memory-efficient training with streaming dimensionality reduction , 2020, ArXiv.

[13]  Xiao Liu,et al.  Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.

[14]  Sangeetha Seshadri,et al.  An Architecture for Memory Centric Active Storage (MCAS) , 2021, ArXiv.

[15]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[16]  Dung N. Tran,et al.  Geometric multi-resolution analysis based classification for high dimensional data , 2014, Defense + Security Symposium.

[17]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[18]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[19]  Vince D. Calhoun,et al.  Memory Efficient PCA Methods for Large Group ICA , 2016, Front. Neurosci..

[20]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[21]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[22]  M. Maggioni,et al.  Multi-scale geometric methods for data sets II: Geometric Multi-Resolution Analysis , 2012 .

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Guangliang Chen,et al.  A fast multiscale framework for data in high-dimensions: Measure estimation, anomaly detection, and compressive measurements , 2012, 2012 Visual Communications and Image Processing.

[25]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[26]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.