Simultaneous Non-Negative Matrix Factorization for Multiple Large Scale Gene Expression Datasets in Toxicology

Non-negative matrix factorization is a useful tool for reducing the dimension of large datasets. This work considers simultaneous non-negative matrix factorization of multiple sources of data. In particular, we perform the first study that involves more than two datasets. We discuss the algorithmic issues required to convert the approach into a practical computational tool and apply the technique to new gene expression data quantifying the molecular changes in four tissue types due to different dosages of an experimental panPPAR agonist in mouse. This study is of interest in toxicology because, whilst PPARs form potential therapeutic targets for diabetes, it is known that they can induce serious side-effects. Our results show that the practical simultaneous non-negative matrix factorization developed here can add value to the data analysis. In particular, we find that factorizing the data as a single object allows us to distinguish between the four tissue types, but does not correctly reproduce the known dosage level groups. Applying our new approach, which treats the four tissue types as providing distinct, but related, datasets, we find that the dosage level groups are respected. The new algorithm then provides separate gene list orderings that can be studied for each tissue type, and compared with the ordering arising from the single factorization. We find that many of our conclusions can be corroborated with known biological behaviour, and others offer new insights into the toxicological effects. Overall, the algorithm shows promise for early detection of toxicity in the drug discovery process.

[1]  Douglas M. Hawkins,et al.  Inferential, robust non-negative matrix factorization analysis of microarray data , 2007, Bioinform..

[2]  J. Leiden,et al.  The structure and regulation of expression of the murine fast skeletal troponin C gene. Identification of a developmentally regulated, muscle-specific transcriptional enhancer. , 1990, The Journal of biological chemistry.

[3]  R Kucherlapati,et al.  Organization of human and mouse skeletal myosin heavy chain gene clusters is highly conserved. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[5]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[6]  P. Puigserver,et al.  Regulation of hepatic fasting response by PPARγ coactivator-1α (PGC-1): Requirement for hepatocyte nuclear factor 4α in gluconeogenesis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Liviu Badea Combining Gene Expression and Transcription Factor Regulation Data using Simultaneous Nonnegative Matrix Factorization , 2007, BIOCOMP.

[8]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Francisco Tirado,et al.  bioNMF: a versatile tool for non-negative matrix factorization in biology , 2006, BMC Bioinformatics.

[10]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[11]  Chengyu Liu,et al.  Biclustering of gene expression data by non-smooth non-negative matrix factorization , 2010 .

[12]  Christine Dreyer,et al.  Control of the peroxisomal β-oxidation pathway by a novel family of nuclear hormone receptors , 1992, Cell.

[13]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[14]  P. Puigserver,et al.  Regulation of hepatic fasting response by PPARgamma coactivator-1alpha (PGC-1): requirement for hepatocyte nuclear factor 4alpha in gluconeogenesis. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  R. Solaro,et al.  Calcium, thin filaments, and the integrative biology of cardiac contractility. , 2005, Annual review of physiology.

[16]  N. Mitro,et al.  Lipid sensing and lipid sensors , 2007, Cellular and Molecular Life Sciences.

[17]  T. Osborne,et al.  Coordinated control of bile acids and lipogenesis through FXR-dependent regulation of fatty acid synthase1 Published, JLR Papers in Press, September 6, 2006. , 2006, Journal of Lipid Research.

[18]  Robert Kleemann,et al.  Negative Regulation of Human Fibrinogen Gene Expression by Peroxisome Proliferator-activated Receptor α Agonists via Inhibition of CCAAT Box/Enhancer-binding Protein β* , 2001, The Journal of Biological Chemistry.

[19]  J. Bautista,et al.  Isoform diversity, regulation, and functional adaptation of troponin and calponin. , 2008, Critical reviews in eukaryotic gene expression.

[20]  Shenghua Shi,et al.  Scaffold-based discovery of indeglitazar, a PPAR pan-active anti-diabetic agent , 2009, Proceedings of the National Academy of Sciences.

[21]  Liviu Badea,et al.  Extracting Gene Expression Profiles Common to Colon and Pancreatic Adenocarcinoma Using Simultaneous Nonnegative Matrix Factorization , 2007, Pacific Symposium on Biocomputing.

[22]  Hui-Rong Qian,et al.  Underlying mechanisms of pharmacology and toxicity of a novel PPAR agonist revealed using rodent and canine hepatocytes. , 2007, Toxicological sciences : an official journal of the Society of Toxicology.

[23]  Daniel Jones,et al.  Potential remains for PPAR-targeted drugs , 2010, Nature Reviews Drug Discovery.

[24]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[25]  A. Larsson,et al.  Apolipoprotein A1 is a stronger prognostic marker than are HDL and LDL cholesterol for cardiovascular disease and mortality in elderly men. , 2006, The journals of gerontology. Series A, Biological sciences and medical sciences.

[26]  Sander Kersten,et al.  Roles of PPARs in health and disease , 2000, Nature.

[27]  J. Robotham,et al.  The acute-phase response. , 1995, New horizons.

[28]  B Staels,et al.  Negative regulation of human fibrinogen gene expression by peroxisome proliferator-activated receptor alpha agonists via inhibition of CCAAT box/enhancer-binding protein beta. , 2001, The Journal of biological chemistry.

[29]  A. Siegbahn,et al.  Myocardial damage, inflammation and thrombin inhibition in unstable coronary artery disease. , 2003, European heart journal.

[30]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[31]  S. Perry What is the role of tropomyosin in the regulation of muscle contraction? , 2004, Journal of Muscle Research & Cell Motility.

[32]  K. Umesono,et al.  Differential expression and activation of a family of murine peroxisome proliferator-activated receptors. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[33]  N. Everds,et al.  Principles of Clinical Pathology for Toxicology Studies , 2007 .

[34]  Yvonne M. Kobayashi,et al.  Complex Formation between Junctin, Triadin, Calsequestrin, and the Ryanodine Receptor , 1997, The Journal of Biological Chemistry.

[35]  P. Siersema,et al.  Bile acids and their nuclear receptor FXR: Relevance for hepatobiliary and gastrointestinal disease. , 2010, Biochimica et biophysica acta.

[36]  F. Reinach,et al.  The troponin complex and regulation of muscle contraction , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[37]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[38]  T. Mello Nuclear Receptors in the Regulation of Lipid Metabolism , 2010 .