A high performance hardware architecture for non-negative tensor factorization

Abstract Non-negative tensor factorization (NTF) algorithm is an emerging method for high-dimensional data analysis, which is applied in many fields such as computer vision, and bioinformatics. This paper presents an effective method to accelerate NTF computations and proposes a corresponding hardware architecture, which consists of multiple processing units. The decomposed factors are calculated by using shared intermediate results. By using the proposed method, NTF can be implemented in parallel and hardware resources can be saved by sharing. In this paper, we evaluate the proposed architecture on Xilinx Virtex-6 FPGA XC6VLX760T and apply it into 2 applications, i.e. video background estimation and facial images processing. The experimental results show that the proposed hardware architecture achieves over 80 times faster than CPU implementation of NTF. Compared with the implementations on GPGPU, the proposed architecture achieves nearly the same speedup. While the strategy in this paper is applied to GPU platforms, the execution time can be reduced by a half, due to the computational sharing in this paper.

[1]  Bülent Yener,et al.  Collective Sampling and Analysis of High Order Tensors for Chatroom Communications , 2006, ISI.

[2]  Francisco Tirado,et al.  NMF-mGPU: non-negative matrix factorization on multi-GPU systems , 2015, BMC Bioinformatics.

[3]  Andrzej Cichocki,et al.  Fast Nonnegative Matrix/Tensor Factorization Based on Low-Rank Approximation , 2012, IEEE Transactions on Signal Processing.

[4]  Tamara G. Kolda,et al.  Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[5]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[6]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[7]  Qiang Zhang,et al.  A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data , 2009, ICCS.

[8]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[9]  Karl Rupp,et al.  GPU-Accelerated Non-negative Matrix Factorization for Text Mining , 2012, NLDB.

[10]  Amnon Shashua,et al.  Linear image coding for regression and classification using the tensor-rank principle , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[12]  Markku Hauta-Kasari,et al.  Nonnegative Tensor Factorization Accelerated Using GPGPU , 2011, IEEE Transactions on Parallel and Distributed Systems.

[13]  Tamir Hazan,et al.  Sparse image coding using a 3D non-negative tensor factorization , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Yousef Saad,et al.  Higher Order Orthogonal Iteration of Tensors (HOOI) and its Relation to PCA and GLRAM , 2007, SDM.

[15]  Marco D. Santambrogio,et al.  A Hardware Acceleration for Surface EMG Non-Negative Matrix Factorization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).