Rough assessment of GPU capabilities for parallel PCC-based biclustering method applied to microarray data sets

Abstract Parallel computing architectures are proven to significantly shorten computation time for different clustering algorithms. Nonetheless, some characteristics of the architecture limit the application of graphics processing units (GPUs) for biclustering task, whose function is to find focal similarities within the data. This might be one of the reasons why there have not been many biclustering algorithms proposed so far. In this article, we verify if there is any potential for application of complex biclustering calculations (CPU+GPU). We introduce minimax with Pearson correlation – a complex biclustering method. The algorithm utilizes Pearson’s correlation to determine similarity between rows of input matrix. We present two implementations of the algorithm, sequential and parallel, which are dedicated for heterogeneous environments. We verify the weak scaling efficiency to assess if a heterogeneous architecture may successfully shorten heavy biclustering computation time.

[1]  Ümit V. Çatalyürek,et al.  A Biclustering Method to Discover Co-regulated Genes Using Diverse Gene Expression Datasets , 2009, BICoB.

[2]  Krzysztof Boryczko,et al.  Effective biclustering on GPU - capabilities and constraints , 2015 .

[3]  Hong Yan,et al.  GPU-based biclustering for microarray data analysis in neurocomputing , 2014, Neurocomputing.

[4]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Sara C. Madeira,et al.  Parallel e-CCC-Biclustering: Mining Approximate Temporal Patterns in Gene Expression Time Series Using Parallel Biclustering , 2012, PACBB.

[6]  Gilles Bisson,et al.  Chi-Sim: A New Similarity Measure for the Co-clustering Task , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[7]  Fabrício Olivetti de França,et al.  Predicting missing values with biclustering: A coherence-based approach , 2013, Pattern Recognit..

[8]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[9]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[10]  Ray C. C. Cheung,et al.  GPU-Based Biclustering for Neural Information Processing , 2012, ICONIP.

[11]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  Francisco Tirado,et al.  Biclustering and classification analysis in gene expression using Nonnegative Matrix Factorization on multi-GPU systems , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[14]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[15]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[16]  Jesús S. Aguilar-Ruiz,et al.  Shifting and scaling patterns from gene expression data , 2005, Bioinform..

[17]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[18]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[19]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..