Biclustering and classification analysis in gene expression using Nonnegative Matrix Factorization on multi-GPU systems

A great interest has been given to the Nonnegative Matrix Factorization (NMF) technique due to its ability of extracting highly-interpretable parts from data sets. Gene expression analysis is one of the most popular applications of NMF in Bioinformatics. Nonetheless, its usage is hindered by the computational complexity when processing large data sets. In this paper, we present two parallel implementations of NMF. The first version uses CUDA on a Graphics Processing Unit (GPU). Large input matrices are iteratively blockwise transferred and processed. The second implementation distributes data among multiple GPUs synchronized through MPI (Message Passing Interface). When analyzing large data sets with two and four GPUs, it performs respectively, 2.3 and 4.13 times faster than the single-GPU version. This represents about 120 times faster than a conventional CPU. These super linear speedups are achieved when data portions assigned to each GPU are small enough to be transferred only once.

[1]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[2]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[3]  Miki Haseyama,et al.  Missing Intensity Interpolation Using a Kernel PCA-Based POCS Algorithm and its Applications , 2011, IEEE Transactions on Image Processing.

[4]  Chengyu Liu,et al.  Biclustering of gene expression data by non-smooth non-negative matrix factorization , 2010 .

[5]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6]  Noel Lopes,et al.  Non-negative Matrix Factorization Implementation Using Graphic Processing Units , 2010, IDEAL.

[7]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[8]  Václav Snásel,et al.  Non-negative Matrix Factorization on GPU , 2010, NDT.

[9]  R. C. Whaley,et al.  ATLAS (Automatically Tuned Linear Algebra Software) , 2011, Encyclopedia of Parallel Computing.

[10]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Francisco Tirado,et al.  bioNMF: a versatile tool for non-negative matrix factorization in biology , 2006, BMC Bioinformatics.

[12]  David Wessel,et al.  Accelerating Non-Negative Matrix Factorization for Audio Source Separation on Multi-Core and Many-Core Architectures , 2009, ISMIR.

[13]  M.W. Berry,et al.  Computational Methods for Intelligent Information Access , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[14]  Francisco Tirado,et al.  bioNMF: a web-based tool for nonnegative matrix factorization in biology , 2008, Nucleic Acids Res..

[15]  Hagit Shatkay,et al.  Discovering semantic features in the literature: a foundation for building functional associations , 2006, BMC Bioinformatics.

[16]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.