An information-theoretic approach for measuring the distance of organ tissue samples using their transcriptomic signatures

Motivation Recapitulating aspects of human organ functions using in-vitro (e.g., plates, transwells, etc.), in-vivo (e.g., mouse, rat, etc.), or ex-vivo (e.g., organ chips, 3D systems, etc.) organ models are of paramount importance for precision medicine and drug discovery. It will allow us to identify potential side effects and test the effectiveness of therapeutic approaches early in their design phase and will inform the development of accurate disease models. Developing mathematical methods to reliably compare the “distance/similarity” of organ models from/to the real human organ they represent is an understudied problem with important applications in biomedicine and tissue engineering. Results We introduce the Transctiptomic Signature Distance, TSD, an information-theoretic distance for assessing the transcriptomic similarity of two tissue samples, or two groups of tissue samples. In developing TSD, we are leveraging next-generation sequencing data and information retrieved from well-curated databases providing signature gene sets characteristic for human organs. We present the justification and mathematical development of the new distance and demonstrate its effectiveness in different scenarios of practical importance using several publicly available RNA-seq datasets. Contact dimitris.manatakis@emulatebio.com Supplementary information Supplementary data are available at bioRxiv.

[1]  Helen M. Moore,et al.  Acknowledgement to Reviewers of the Journal of Personalized Medicine in 2013 , 2014, Journal of Personalized Medicine.

[2]  Pietro Liò,et al.  The BioMart community portal: an innovative alternative to large, centralized data repositories , 2015, Nucleic Acids Res..

[3]  Piero Carninci,et al.  Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium , 2015, Nucleic acids research.

[4]  Naresh Kumar Nagwani,et al.  A Comment on "A Similarity Measure for Text Classification and Clustering" , 2015, IEEE Trans. Knowl. Data Eng..

[5]  Y. Cui,et al.  Differential gene expression analysis reveals novel genes and pathways in pediatric septic shock patients , 2019, Scientific Reports.

[6]  Dmitri D. Pervouchine,et al.  The human transcriptome across tissues and individuals , 2015, Science.

[7]  Bon-Kyoung Koo,et al.  Human Primary Liver Cancer -derived Organoid Cultures for disease modelling and drug screening , 2017, Nature Medicine.

[8]  S. Rosselot Idiopathic pulmonary fibrosis. , 2014, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[9]  T. Franquet,et al.  [Idiopathic interstitial pneumonias]. , 2012, Radiologia.

[10]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[11]  Yen Kaow Ng,et al.  On triangular Inequalities of correlation-based distances for gene expression profiles , 2019, bioRxiv.

[12]  C. Lindskog,et al.  A pathology atlas of the human cancer transcriptome , 2017, Science.

[13]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[14]  David Waxman,et al.  A Problem With the Correlation Coefficient as a Measure of Gene Expression Divergence , 2009, Genetics.

[15]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[16]  M. Suntsova,et al.  Atlas of RNA sequencing profiles for normal human tissues , 2019, Scientific Data.

[17]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[18]  Moisés Selman,et al.  Idiopathic pulmonary fibrosis , 2011, The Lancet.

[19]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[20]  Abhishek Srivastava,et al.  Reproducing human and cross-species drug toxicities using a Liver-Chip , 2019, Science Translational Medicine.

[21]  H. Collard,et al.  Idiopathic interstitial pneumonias. , 2009, Journal of thoracic imaging.

[22]  Ziv Bar-Joseph,et al.  Transcriptional regulatory model of fibrosis progression in the human lung. , 2019, JCI insight.

[23]  Rajesh Wadhvani,et al.  A Review on Text Similarity Technique used in IR and its Application , 2015 .

[24]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[25]  A. Oshlack,et al.  Evaluation of variability in human kidney organoids , 2018, Nature Methods.

[26]  J. Li,et al.  TROM: A Testing-Based Method for Finding Transcriptomic Similarity of Biological Samples , 2016, Statistics in biosciences.

[27]  Peter H. Sudmant,et al.  Meta-analysis of RNA-seq expression data across species, tissues and studies , 2015, Genome Biology.

[28]  Javier De Las Rivas,et al.  Identification of expression patterns in the progression of disease stages by integration of transcriptomic data , 2016, BMC Bioinformatics.

[29]  S. Nair,et al.  Cell-Type-Specific Gene Expression Profiling in Adult Mouse Brain Reveals Normal and Disease-State Signatures. , 2019, Cell reports.

[30]  H. Collard,et al.  Classification and Natural History of the Idiopathic Interstitial Pneumonias Who Is at Risk of Ipf? What Causes Ipf? , 2022 .

[31]  Pardis C Sabeti,et al.  Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq , 2018, bioRxiv.

[32]  M. Donowitz,et al.  Organoid-derived Duodenum Intestine-Chip for preclinical drug assessment in a human relevant system , 2019, bioRxiv.

[33]  Kyung-Jin Jang,et al.  Duodenum Intestine-Chip for preclinical drug assessment in a human relevant model , 2020, eLife.

[34]  P. Pavlidis,et al.  Predictability of human differential gene expression , 2019, Proceedings of the National Academy of Sciences.

[35]  D. Lederer,et al.  Idiopathic Pulmonary Fibrosis. , 2018, The New England journal of medicine.

[36]  E. Stupka,et al.  An RNA-Seq atlas of gene expression in mouse and rat normal tissues , 2017, Scientific Data.