Improved biclustering on expression data through overlapping control

– The purpose of this paper is to present a novel control mechanism for avoiding overlapping among biclusters in expression data., – Biclustering is a technique used in analysis of microarray data. One of the most popular biclustering algorithms is introduced by Cheng and Church (2000) (Ch&Ch). Even if this heuristic is successful at finding interesting biclusters, it presents several drawbacks. The main shortcoming is that it introduces random values in the expression matrix to control the overlapping. The overlapping control method presented in this paper is based on a matrix of weights, that is used to estimate the overlapping of a bicluster with already found ones. In this way, the algorithm is always working on real data and so the biclusters it discovers contain only original data., – The paper shows that the original algorithm wrongly estimates the quality of the biclusters after some iterations, due to random values that it introduces. The empirical results show that the proposed approach is effective in order to improve the heuristic. It is also important to highlight that many interesting biclusters found by using our approach would have not been obtained using the original algorithm., – The original algorithm proposed by Ch&Ch is one of the most successful algorithms for discovering biclusters in microarray data. However, it presents some limitations, the most relevant being the substitution phase adopted in order to avoid overlapping among biclusters. The modified version of the algorithm proposed in this paper improves the original one, as proven in the experimentation.

[1]  Suvrit Sra,et al.  Minimum Sum-Squared Residue based clustering of Gene Expression Data , 2004 .

[2]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[4]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[5]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Padraig Cunningham,et al.  BALBOA: Extending Bicluster Analysis to Classify ORFs using Expression Data , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[7]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[8]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[9]  Jun Ni,et al.  Clustering of gene expression data: performance and similarity analysis , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[10]  Robert M. Haralick,et al.  Exploiting the Geometry of Gene Expression Patterns for Unsupervised Learning , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11]  Philip S. Yu,et al.  An Improved Biclustering Method for Analyzing Gene Expression Profiles , 2005, Int. J. Artif. Intell. Tools.

[12]  Claire Tilstone DNA microarrays: Vital statistics , 2003, Nature.

[13]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[14]  Pierre Baldi,et al.  DNA Microarrays and Gene Expression - From Experiments to Data Analysis and Modeling , 2002 .

[15]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Gregory Piatetsky-Shapiro,et al.  Capturing best practice for microarray gene expression data analysis , 2003, KDD '03.

[17]  Jesús S. Aguilar-Ruiz,et al.  Biclustering of Gene Expression Data Based on Local Nearness , 2006, EGC.

[18]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.