Multi-task consensus clustering of genome-wide transcriptomes from related biological conditions

MOTIVATION Identifying the shared and pathogen-specific components of host transcriptional regulatory programs is important for understanding the principles of regulation of immune response. Recent efforts in systems biology studies of infectious diseases have resulted in a large collection of datasets measuring host transcriptional response to various pathogens. Computational methods to identify and compare gene expression modules across different infections offer a powerful way to identify strain-specific and shared components of the regulatory program. An important challenge is to identify statistically robust gene expression modules as well as to reliably detect genes that change their module memberships between infections. RESULTS We present MULCCH (MULti-task spectral Consensus Clustering for Hierarchically related tasks), a consensus extension of a multi-task clustering algorithm to infer high-confidence strain-specific host response modules under infections from multiple virus strains. On simulated data, MULCCH more accurately identifies genes exhibiting pathogen-specific patterns compared to non-consensus and nonmulti-task clustering approaches. Application of MULCCH to mammalian transcriptional response to a panel of influenza viruses showed that our method identifies clusters with greater coherence compared to non-consensus methods. Further, MULCCH derived clusters are enriched for several immune system-related processes and regulators. In summary, MULCCH provides a reliable module-based approach to identify molecular pathways and gene sets characterizing commonality and specificity of host response to viruses of different pathogenicities. AVAILABILITY AND IMPLEMENTATION The source code is available at https://bitbucket.org/roygroup/mulcch CONTACT sroy@biostat.wisc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[2]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[3]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[4]  A. Kimura,et al.  Chromosomal gradient of histone acetylation established by Sas2p and Sir2p functions as a shield against gene silencing , 2002, Nature Genetics.

[5]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[6]  S. Rafii,et al.  Splitting vessels: Keeping lymph apart from blood , 2003, Nature Medicine.

[7]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[8]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[10]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[11]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[12]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  M. Newton,et al.  Drosophila RNAi screen identifies host genes important for influenza virus replication , 2008, Nature.

[14]  Willy Verstraete,et al.  How to get more out of molecular fingerprints: practical tools for microbial ecology. , 2008, Environmental microbiology.

[15]  Muin J Khoury,et al.  Systems-based candidate genes for human response to influenza infection , 2009, Infection, Genetics and Evolution.

[16]  David J. Adams,et al.  The IFITM Proteins Mediate Cellular Resistance to Influenza A H1N1 Virus, West Nile Virus, and Dengue Virus , 2009, Cell.

[17]  N. Hacohen,et al.  A Physical and Regulatory Map of Host-Influenza Interactions Reveals Pathways in H1N1 Infection , 2009, Cell.

[18]  M. Kinch,et al.  The use of Random Homozygous Gene Perturbation to identify novel host-oriented targets for influenza. , 2009, Virology.

[19]  A. García-Sastre,et al.  Inhibition of the Ubiquitin-Proteasome System Affects Influenza A Virus Infection at a Postfusion Step , 2010, Journal of Virology.

[20]  R. König,et al.  Human Host Factors Required for Influenza Virus Replication , 2010, Nature.

[21]  Daniel Becker,et al.  Genome-wide RNAi screen identifies human host factors crucial for influenza virus replication , 2010, Nature.

[22]  P. Vidalain,et al.  Generation and Comprehensive Analysis of an Influenza Virus Polymerase Cellular Interaction Network , 2011, Journal of Virology.

[23]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[24]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[25]  Zoubin Ghahramani,et al.  Bayesian correlated clustering to integrate multiple datasets , 2012, Bioinform..

[26]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[27]  Ron Shamir,et al.  Dissection of Regulatory Networks that Are Altered in Disease via Differential Co-expression , 2013, PLoS Comput. Biol..

[28]  Ralf Bartenschlager,et al.  The Interactomes of Influenza Virus NS1 and NS2 Proteins Identify New Host Factors and Provide Insights for ADAR1 Playing a Supportive Role in Virus Replication , 2013, PLoS pathogens.

[29]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[30]  Manolis Kellis,et al.  Arboretum: Reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules , 2013, Genome research.

[31]  Richard D. Smith,et al.  Specific mutations in H5N1 mainly impact the magnitude and velocity of the host response in mice , 2013, BMC Systems Biology.

[32]  B. Semler,et al.  Differential restriction patterns of mRNA decay factor AUF1 during picornavirus infections. , 2014, The Journal of general virology.

[33]  Tiago J. S. Lopes,et al.  Influenza virus-host interactome screen as a platform for antiviral drug development. , 2014, Cell host & microbe.

[34]  Qibin Zhang,et al.  A comprehensive collection of systems biology data characterizing the host response to viral infection , 2014, Scientific Data.

[35]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[36]  Sascha Ott,et al.  Wigwams: identifying gene modules co-regulated across multiple biological conditions , 2013, Bioinform..

[37]  C. Mungall,et al.  Gene Ontology Consortium : going forward The Gene Ontology , 2015 .