MultiFacTV: module detection from higher-order time series biological data

BackgroundIdentifying modules from time series biological data helps us understand biological functionalities of a group of proteins/genes interacting together and how responses of these proteins/genes dynamically change with respect to time. With rapid acquisition of time series biological data from different laboratories or databases, new challenges are posed for the identification task and powerful methods which are able to detect modules with integrative analysis are urgently called for. To accomplish such integrative analysis, we assemble multiple time series biological data into a higher-order form, e.g., a gene × condition × time tensor. It is interesting and useful to develop methods to identify modules from this tensor.ResultsIn this paper, we present MultiFacTV, a new method to find modules from higher-order time series biological data. This method employs a tensor factorization objective function where a time-related total variation regularization term is incorporated. According to factorization results, MultiFacTV extracts modules that are composed of some genes, conditions and time-points. We have performed MultiFacTV on synthetic datasets and the results have shown that MultiFacTV outperforms existing methods EDISA and Metafac. Moreover, we have applied MultiFacTV to Arabidopsis thaliana root(shoot) tissue dataset represented as a gene×condition×time tensor of size 2395 × 9 × 6(3454 × 8 × 6), to Yeast dataset and Homo sapiens dataset represented as tensors of sizes 4425 × 6 × 6 and 2920×14×9 respectively. The results have shown that MultiFacTV indeed identifies some interesting modules in these datasets, which have been validated and explained by Gene Ontology analysis with DAVID or other analysis.ConclusionExperimental results on both synthetic datasets and real datasets show that the proposed MultiFacTV is effective in identifying modules for higher-order time series biological data. It provides, compared to traditional non-integrative analysis methods, a more comprehensive and better view on biological process since modules composed of more than two types of biological variables could be identified and analyzed.

[1]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[4]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[5]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[6]  Hur-Song Chang,et al.  Transcriptome Changes for Arabidopsis in Response to Salt, Osmotic, and Cold Stress1,212 , 2002, Plant Physiology.

[7]  S. Rhee,et al.  TAIR: a resource for integrated Arabidopsis data , 2002, Functional & Integrative Genomics.

[8]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[9]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[10]  Murali Ramanathan,et al.  Genomic Effects of IFN-β in Multiple Sclerosis Patients 1 , 2003, The Journal of Immunology.

[11]  Marina Meila,et al.  Spectral Clustering of Biological Sequence Data , 2005, AAAI.

[12]  E. Bornberg-Bauer,et al.  The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. , 2007, The Plant journal : for cell and molecular biology.

[13]  G. Golub,et al.  A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies , 2007, Proceedings of the National Academy of Sciences.

[14]  Andreas Zell,et al.  EDISA: extracting biclusters from multiple time-series of gene expression profiles , 2007, BMC Bioinformatics.

[15]  Michael W. Mahoney,et al.  Intra- and interpopulation genotype reconstruction from tagging SNPs. , 2006, Genome research.

[16]  Francisco Tirado,et al.  bioNMF: a web-based tool for nonnegative matrix factorization in biology , 2008, Nucleic Acids Res..

[17]  Jimeng Sun,et al.  MetaFac: community discovery via relational hypergraph factorization , 2009, KDD.

[18]  Bülent Yener,et al.  Unsupervised Multiway Data Analysis: A Literature Survey , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[20]  C. Verweij,et al.  Pharmacogenomics of IFN-beta in multiple sclerosis: towards a personalized medicine approach. , 2009, Pharmacogenomics.

[21]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[22]  Michael K. Ng,et al.  Solving Constrained Total-variation Image Restoration and Reconstruction Problems via Alternating Direction Methods , 2010, SIAM J. Sci. Comput..

[23]  Jun Zhu,et al.  Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets , 2010, PLoS Comput. Biol..

[24]  Chengyu Liu,et al.  Biclustering of gene expression data by non-smooth non-negative matrix factorization , 2010 .

[25]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[26]  Haifeng Li,et al.  Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation , 2011, PLoS Comput. Biol..

[27]  Martin Sill,et al.  Robust biclustering by sparse singular value decomposition incorporating stability selection , 2011, Bioinform..

[28]  Yunming Ye,et al.  MultiFacTV: Finding modules from higher-order gene expression profiles with time dimension , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[29]  Yuan Zhang,et al.  A collective NMF method for detecting protein functional module from multiple data sources , 2012, BCB.

[30]  Dongsup Kim,et al.  LinkNMF: identification of histone modification modules in the human genome using nonnegative matrix factorization. , 2013, Gene.