ShinyButchR: Interactive NMF-based decomposition workflow of genome-scale datasets

Abstract Non-negative matrix factorization (NMF) has been widely used for the analysis of genomic data to perform feature extraction and signature identification due to the interpretability of the decomposed signatures. However, running a basic NMF analysis requires the installation of multiple tools and dependencies, along with a steep learning curve and computing time. To mitigate such obstacles, we developed ShinyButchR, a novel R/Shiny application that provides a complete NMF-based analysis workflow, allowing the user to perform matrix decomposition using NMF, feature extraction, interactive visualization, relevant signature identification, and association to biological and clinical variables. ShinyButchR builds upon the also novel R package ButchR, which provides new TensorFlow solvers for algorithms of the NMF family, functions for downstream analysis, a rational method to determine the optimal factorization rank and a novel feature selection strategy.

[1]  Fred L. Drake,et al.  Python 3 Reference Manual , 2009 .

[2]  Jen Jen Yeh,et al.  Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma , 2015, Nature Genetics.

[3]  Dong Wang,et al.  Identification of high-confidence RNA regulatory elements by combinatorial classification of RNA–protein binding sites , 2017, Genome Biology.

[4]  Jerry Radich,et al.  Molecular characterization of early human T/NK and B-lymphoid progenitor cells in umbilical cord blood. , 2004, Blood.

[5]  Elgene Lim,et al.  Open Access Research Article Transcriptome Analyses of Mouse and Human Mammary Cell Subpopulations Reveal Multiple Conserved Genes and Pathways , 2022 .

[6]  Alex Diaz-Papkovich,et al.  UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts , 2019, PLoS genetics.

[7]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[8]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[9]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[10]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Paul C. Boutros,et al.  Optimization and expansion of non-negative matrix factorization , 2020, BMC Bioinformatics.

[12]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[13]  Chuang Lin,et al.  Graph Regularized Nonnegative Matrix Factorization with Sparse Coding , 2015 .

[14]  Luke Macyszyn,et al.  Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes , 2014, Nucleic acids research.

[15]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Siqi Wu,et al.  Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks , 2016, Proceedings of the National Academy of Sciences.

[17]  M. Stratton,et al.  Deciphering Signatures of Mutational Processes Operative in Human Cancer , 2013, Cell reports.

[18]  Angelo J. Canty,et al.  Stem cell gene expression programs influence clinical outcome in human leukemia , 2011, Nature Medicine.

[19]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[20]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[21]  Howard Y. Chang,et al.  Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution , 2016, Nature Genetics.

[22]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.