Biclustering Three-Dimensional Data Arrays With Plaid Models

Three-dimensional data arrays (collections of individual data matrices) are increasingly prevalent in modern data and pose unique challenges to pattern extraction and visualization. This article introduces a biclustering technique for exploration and pattern detection in such complex structured data. The proposed framework couples the popular plaid model together with tools from functional data analysis to guide the estimation of bicluster responses over the array. We present an efficient algorithm that first detects biclusters that exhibit strong deviations for some data matrices, and then estimates their responses over the entire data array. Altogether, the framework is useful to home in on and display underlying structure and its evolution over conditions/time. The methods are scalable to large datasets, and can accommodate a variety of dynamic patterns. The proposed techniques are illustrated on gene expression data and bilateral trade networks. Supplementary materials are available online.

[1]  S. Keleş,et al.  A Linear Mixed Effects Clustering Model for Multi-species Time Course Gene Expression Data 1 , 2008 .

[2]  Wojtek J. Krzanowski,et al.  Improved biclustering of microarray data demonstrated through systematic performance tests , 2005, Comput. Stat. Data Anal..

[3]  George Michailidis,et al.  Principal Component Analysis With Sparse Fused Loadings , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[4]  L. Qin,et al.  The Clustering of Regression Models Method with Applications in Gene Expression Data , 2006, Biometrics.

[5]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[6]  Holger Schwender,et al.  Bibliography Reverse Engineering Genetic Networks Using the Genenet Package , 2006 .

[7]  J. Stiglitz SOME LESSONS FROM THE EAST ASIAN MIRACLE , 1996 .

[8]  Sadiq Hussain,et al.  Improved Biclustering Of Microarray Data , 2010 .

[9]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[10]  J. O. Ramsay,et al.  Functional Data Analysis (Springer Series in Statistics) , 1997 .

[11]  Hari Mukerjee,et al.  Monotone Nonparametric Regression , 1988 .

[12]  J. Leeuw,et al.  Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods , 2009 .

[13]  Kenneth Rogoff,et al.  NBER Macroeconomics Annual 2003 , 2001 .

[14]  Christine Osborne,et al.  Statistical Calibration: A Review , 1991 .

[15]  Lada A. Adamic,et al.  Coevolution of network structure and content , 2011, WebSci '12.

[16]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[17]  Haifeng Li,et al.  Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation , 2011, PLoS Comput. Biol..

[18]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[19]  Wenxuan Zhong,et al.  A data-driven clustering method for time course gene expression data , 2006, Nucleic acids research.

[20]  Duncan J. Watts,et al.  The Structure and Dynamics of Networks: (Princeton Studies in Complexity) , 2006 .

[21]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[22]  William F Rosenberger,et al.  Competing designs for phase I clinical trials: a review , 2002, Statistics in medicine.

[23]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  D. Rodrik,et al.  Trade Policy and Economic Growth: A Skeptic's Guide to the Cross-National Evidence , 1999, NBER Macroeconomics Annual.

[25]  Roger D. Peng,et al.  A Method for Visualizing Multivariate Time Series Data , 2008 .

[26]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[27]  Mark E. J. Newman,et al.  Structure and Dynamics of Networks , 2009 .

[28]  Wojtek J. Krzanowski,et al.  Biclustering models for structured microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  David Brown,et al.  Pharmacodynamic Modeling of Anti-Cancer Activity of Tetraiodothyroacetic Acid in a Perfused Cell Culture System , 2011, PLoS Comput. Biol..

[30]  Zoubin Ghahramani,et al.  Modeling T-cell activation using gene expression profiling and state-space models , 2004, Bioinform..

[31]  Fei Zhu,et al.  On Clustering Algorithms for Biological Data , 2013 .

[32]  R. Nelson,et al.  Debt Sustainability Under Catastrophic Risk: The Case for Government Budget Insurance , 1997 .

[33]  Mark Newman,et al.  Networks: An Introduction , 2010 .