Distributed geometric nonnegative matrix factorization and hierarchical alternating least squares–based nonnegative tensor factorization with the MapReduce paradigm

Nonnegative matrix factorization and its multilinear extension known as nonnegative tensor factorization are commonly used methods in machine learning and data analysis for feature extraction and dimensionality reduction for nonnegative high‐dimensional data. Dimensionality reduction for massive amounts of data usually involves distributed computation across multi‐node computer architectures. In this study, we propose various computational strategies for parallel and distributed computation of the latent factors in both factorization models, all of which are based on partitioning the computational tasks according to the MapReduce paradigm. We extend the previously reported distributed hierarchical alternating least squares algorithm to the multi‐way array factorization model, where we assume that the observed multi‐way data can be partitioned into chunks along one mode. Moreover, we propose a new geometry‐based distributed computational strategy for solving nonnegative matrix factorization problems. Numerical experiments performed using various large‐scale data sets demonstrated that these algorithms are efficient and robust to noisy data.

[1]  Andrzej Cichocki,et al.  PARAFAC algorithms for large-scale problems , 2011, Neurocomputing.

[2]  Nikos D. Sidiropoulos,et al.  SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[3]  Zenglin Xu,et al.  Distributed Flexible Nonlinear Tensor Factorization , 2016, NIPS.

[4]  Haesun Park,et al.  A high-performance parallel algorithm for nonnegative matrix factorization , 2015, PPoPP.

[5]  Andrzej Cichocki,et al.  Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization , 2007, ICA.

[6]  Vikas Sindhwani,et al.  Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization , 2012, ICML.

[7]  Lixin Gao,et al.  Scalable Nonnegative Matrix Factorization with Block-wise Updates , 2014, ECML/PKDD.

[8]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[10]  Lee Sael,et al.  SCouT: Scalable coupled matrix-tensor factorization - algorithm and discoveries , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[11]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[12]  J. H. Choi,et al.  DFacTo: Distributed Factorization of Tensors , 2014, NIPS.

[13]  Xiaokang Yang,et al.  Evaluation of Different Algorithms of Nonnegative Matrix Factorization in Temporal Psychovisual Modulation , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  George Karypis,et al.  Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.

[15]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[16]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[17]  Zhigang Luo,et al.  Distributed graph regularized non-negative matrix factorization with greedy coordinate descent , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[18]  Joel A. Tropp,et al.  Factoring nonnegative matrices with linear programs , 2012, NIPS.

[19]  Lee Sael,et al.  Fully Scalable Methods for Distributed Tensor Factorization , 2017, IEEE Transactions on Knowledge and Data Engineering.

[20]  Nicolas Gillis,et al.  Robustness Analysis of Hottopixx, a Linear Programming Model for Factoring Nonnegative Matrices , 2012, SIAM J. Matrix Anal. Appl..

[21]  Peter J. Haas,et al.  Ricardo: integrating R and Hadoop , 2010, SIGMOD Conference.

[22]  Qiang Zhang,et al.  A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data , 2009, ICCS.

[23]  Arun Yadav,et al.  MapReduce implementation of Variational Bayesian Probabilistic Matrix Factorization algorithm , 2013, 2013 IEEE International Conference on Big Data.

[24]  Rafal Zdunek,et al.  Distributed Nonnegative Matrix Factorization with HALS Algorithm on MapReduce , 2017, ICA3PP.

[25]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[26]  Huijie Zhao,et al.  Parallel Nonnegative Matrix Factorization Algorithm on the Distributed Memory Platform , 2010, International Journal of Parallel Programming.

[27]  Lixin Gao,et al.  Scalable Linear Visual Feature Learning via Online Parallel Nonnegative Matrix Factorization , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Naphtali Rishe,et al.  Large-Scale Matrix Factorization Using MapReduce , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[29]  Shuigeng Zhou,et al.  CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Large-scale Biological Datasets , 2014, Genom. Proteom. Bioinform..

[30]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[31]  Lixing Han,et al.  Alternating projected Barzilai-Borwein methods for nonnegative matrix factorization. , 2009 .

[32]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[33]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[34]  Albert Y. Zomaya,et al.  H-PARAFAC: Hierarchical Parallel Factor Analysis of Multidimensional Big Data , 2017, IEEE Transactions on Parallel and Distributed Systems.

[35]  Liana L. Fong,et al.  Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs , 2016, HPDC.

[36]  Haesun Park,et al.  Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons , 2011, SIAM J. Sci. Comput..

[37]  Nikos D. Sidiropoulos,et al.  Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers , 2014, IEEE Transactions on Signal Processing.

[38]  Nicolas Gillis,et al.  Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization , 2011, Neural Computation.

[39]  Nicolas Gillis,et al.  Hierarchical Clustering of Hyperspectral Images Using Rank-Two Nonnegative Matrix Factorization , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[40]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[41]  Sanjeev Arora,et al.  Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[42]  Mireille Guillaume,et al.  HALS-based NMF with flexible constraints for hyperspectral unmixing , 2012, EURASIP J. Adv. Signal Process..

[43]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[44]  Nicolas Gillis,et al.  Robust near-separable nonnegative matrix factorization using linear optimization , 2013, J. Mach. Learn. Res..

[45]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[46]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[47]  Volker Markl,et al.  Distributed matrix factorization with mapreduce using a series of broadcast-joins , 2013, RecSys.

[48]  David F. Gleich,et al.  Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices , 2014, NIPS.

[49]  Tapani Ristaniemi,et al.  Multi-Domain Feature Extraction for Small Event-Related potentials through Nonnegative Multi-Way Array Decomposition from Low Dense Array EEG , 2013, Int. J. Neural Syst..

[50]  Rafal Zdunek Initialization of Nonnegative Matrix Factorization with Vertices of Convex Polytope , 2012, ICAISC.

[51]  Tu Bao Ho,et al.  Accelerated parallel and distributed algorithm using limited internal memory for nonnegative matrix factorization , 2015, J. Glob. Optim..

[52]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[53]  Sabine Van Huffel,et al.  Hierarchical non‐negative matrix factorization applied to three‐dimensional 3 T MRSI data for automatic tissue characterization of the prostate , 2016, NMR in biomedicine.

[54]  Michael Möller,et al.  A Convex Model for Nonnegative Matrix Factorization and Dimensionality Reduction on Physical Space , 2011, IEEE Transactions on Image Processing.

[55]  Nikos D. Sidiropoulos,et al.  Nesterov-based parallel algorithm for large-scale nonnegative tensor factorization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56]  Inderjit S. Dhillon,et al.  Parallel matrix factorization for recommender systems , 2014, Knowl. Inf. Syst..

[57]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[58]  Takumi Kimura,et al.  Global convergence of a modified HALS algorithm for nonnegative matrix factorization , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).