Biclustering Gene Expression Profiles by Alternately Sorting with Weighted Correlated Coefficient

This paper proposes a framework for biclustering gene expression profiles. The framework applies dominant set approach to create sets of sorting vectors. With these sorting vectors, we iteratively sort and transpose the gene expression data. Weighted correlation coefficient is used to measure the similarity in the gene level and the condition level. The weights are assigned according to the similarity measures in the previous level. We refine and update the weights of our similarity measurement in each iteration. This enables us to concentrate on measuring the similarity of relevant features during the biclustering process. In this way, a highly correlated bicluster could be easily located. We have applied this biclustering approach to three real gene expression data sets and found the results very encouraging. In addition, we propose the average correlation value (ACV), a criterion to evaluate the property of a bicluster. This criterion has been compared with the mean squared residue score and ACV is found to be more appropriate.

[1]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[2]  Xinglai Ji,et al.  Mining gene expression data using a novel approach based on hidden Markov models , 2003, FEBS letters.

[3]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[4]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[6]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[7]  M. Pavan,et al.  A new graph-theoretic approach to clustering and segmentation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[10]  T. Motzkin,et al.  Maxima for Graphs and a New Proof of a Theorem of Turán , 1965, Canadian Journal of Mathematics.

[11]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[12]  Li Teng,et al.  Finding dominant sets in microarray data. , 2005, Frontiers in bioscience : a journal and virtual library.

[13]  D. Altman,et al.  Calculating correlation coefficients with repeated observations: Part 2—correlation between subjects , 1995, BMJ.

[14]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[15]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[16]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[17]  Aidong Zhang,et al.  Interrelated two-way clustering: an unsupervised approach for gene expression data analysis , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[18]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.