GPU-based biclustering for microarray data analysis in neurocomputing

Biclustering is one of the important techniques in neurocomputing and bioinformatics. Geometric Biclustering (GBC) algorithm is used to find the common patterns in given microarray data for neural processing. A microarray can produce a massive amount of data and require high computational power for data analysis. With intrinsic parallel architecture and appropriate mapping technique Graphical Processing Unit (GPU) has the advantage of processing large number of threads and data compared to CPU. This paper analyzes the parallelism and data reuse of the GBC algorithm, and presents three different efficient implementations using five benchmarks from real world. The proposed GPU-based GBC program achieves significant speedup over highly optimized CPU program. By comparing implementation results, the paper studies how to design a scalable architecture for mapping the GBC and other similar algorithms that deal with microarray data analysis. The paper also explores how GPU-based GBC is affected by the input data size.

[1]  Ming Yang,et al.  Bicluster Algorithm and Used in Market Analysis , 2009, WKDD.

[2]  Botao Wang,et al.  Efficiently mining local conserved clusters from gene expression data , 2010, Neurocomputing.

[3]  Hong Yan,et al.  Geometric biclustering analysis of DNA microarray data based on hypergraph partitioning , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[4]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[5]  C MadeiraSara,et al.  Biclustering Algorithms for Biological Data Analysis , 2004 .

[6]  Hong Yan,et al.  Discovering biclusters in gene expression data based on high-dimensional linear geometries , 2008, BMC Bioinformatics.

[7]  Fabrício Olivetti de França,et al.  New Perspectives for the Biclustering Problem , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[8]  Ray C. C. Cheung,et al.  GPU-Based Biclustering for Neural Information Processing , 2012, ICONIP.

[9]  Younès Bennani,et al.  Clustering Categorical Data Using an Extended Modularity Measure , 2010, ICONIP.

[10]  Amir Hussain,et al.  A new biclustering technique based on crossing minimization , 2006, Neurocomputing.

[11]  Sushmita Mitra,et al.  Gene interaction - An evolutionary biclustering approach , 2009, Inf. Fusion.

[12]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[13]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Blaise Hanczar,et al.  Using the bagging approach for biclustering of gene expression data , 2011, Neurocomputing.

[15]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[16]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[17]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[18]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[19]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[20]  Jian Zhang,et al.  Generalized plaid models , 2012, Neurocomputing.

[21]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[22]  Hong Yan,et al.  A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data. , 2008, Journal of theoretical biology.