Biclustering of Microarray Data based on Modular Singular Value Decomposition

(1) Dept of Computer and Information Sciences University of Genova, ViaDodecaneso 35, 16146 Genova, Italy - email: {aradhya|masulli|ste}@disi.unige.it,(2) Dept of ISE, Dayananda Sagar College of Engg, Bangalore, India - 560078(3) Sbarro Institute for Cancer Research and Molecular Medicine, Center forBiotechnology, Temple University, BioLife Science Bldg., 1900 N 12th StreetPhiladelphia, PA 19122 USAKeywords: Gene expression data, Microarray data, SVD, Biclustering.Abstract. Unsupervised machine learning methods are widely used in the analysis ofgeneexpressiondataobtainedfrommicroarrayexperiments. Clusteringofdataisoneofthe most popular approaches of analyzing gene expression data. Recently, biclusteringapproach which has shown to be remarkably effective in a variety of applications thatperform simultaneous clustering on the row and column dimension of the data matrix.In this paper, we present a new approach to biclustering called the Modular SingularValue Decomposition (M-SVD-BC) for gene expression. Experimental study on stan-dard datasets demonstrated the effectiveness of the algorithm in gene expression data.1 IntroductionDNA microarray technology is recent throughput and parallel platform that can pro-vide expression profiling of thousands of genes in different biological conditions [19].These samples may correspond to different environmental condition, time points, organand individuals. Examining and analyzing this kind of Bio-informatics data is a strongchallenge that can allow us to obtain a depended knowledge on biological phenomena.DNA microarray data are usually arranged in a matrix, where each row correspondsto a gene and each column an experimental condition. Each entry in the matrix recordsthe expression level of a gene as a real number, which is usually derived by taking thelogarithmic of the relative abundance of the mRNA of that genes in a specific condition[14]. An important objective of analyzing this kind of data is the classification of genesand conditions and the identification of regulatory process. With the aim of analyzingsuch groups and samples, clustering has an important role in the exploratory analysis ofmicroarraydata. Techniquesderivedbyclusteringcanbeappliedtoeithergenesorcon-ditions to investigate the underlying structure. The resultant clusters produce by thesemethods reflect the global pattern of expression data, but an interesting cellular processformostcasesmaybeonlyinvolvedinasubsetofgenesco-expressedonlyunderasub-set of conditions. In order to obtain this kind of structure it is highly desirable to movefurther and to develop approaches capable of discovering local pattern in microarraydata [4].The term biclustering in gene expression analysis was first introduced in [4], whichinspired by Hartigan’s [8] so called direct clustering. In the last few years, researchon biclustering has gaining popularity for its various potential applications. A detailedsurvey on biclustering algorithms for biological data analysis can be found in [13]; thepaper presents a comprehensive survey on the models, methods and applications in thefield of biclustering algorithms. Another interesting survey on biclustering algorithmsis also in [17].

[1]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[2]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[3]  Arlindo L. Oliveira,et al.  A Linear Time Biclustering Algorithm for Time Series Gene Expression Data , 2005, WABI.

[4]  Wojtek J. Krzanowski,et al.  Improved biclustering of microarray data demonstrated through systematic performance tests , 2005, Comput. Stat. Data Anal..

[5]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[6]  Jinze Liu,et al.  Biclustering in gene expression data by tendency , 2004 .

[7]  Hong Yan,et al.  A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data. , 2008, Journal of theoretical biology.

[8]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[9]  Sushmita Mitra,et al.  Possibilistic Approach to Biclustering: An Application to Oligonucleotide Microarray Data Analysis , 2006, CMSB.

[10]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[11]  Hong Yan,et al.  Discovering biclusters in gene expression data based on high-dimensional linear geometries , 2008, BMC Bioinformatics.

[12]  Zhuo Li,et al.  Process variation dimension reduction based on SVD. , 2003 .

[13]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[14]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[16]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[17]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[18]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..