CancerInSilico: An R/Bioconductor package for combining mathematical and statistical modeling to simulate time course bulk and single cell gene expression data in cancer

Bioinformatics techniques to analyze time course bulk and single cell omics data are advancing. The absence of a known ground truth of the dynamics of molecular changes challenges benchmarking their performance on real data. Realistic simulated time-course datasets are essential to assess the performance of time course bioinformatics algorithms. We develop an R/Bioconductor package, CancerInSilico, to simulate bulk and single cell transcriptional data from a known ground truth obtained from mathematical models of cellular systems. This package contains a general R infrastructure for running cell-based models and simulating gene expression data based on the model states. We show how to use this package to simulate a gene expression data set and consequently benchmark analysis methods on this data set with a known ground truth. The package is freely available via Bioconductor: http://bioconductor.org/packages/CancerInSilico/

[1]  N. Socci,et al.  Optimization of Dosing for EGFR-Mutant Non–Small Cell Lung Cancer with Evolutionary Cancer Modeling , 2011, Science Translational Medicine.

[2]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. , 2015, F1000Research.

[3]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[4]  Randy Heiland,et al.  PhysiCell: An open source physics-based cell simulator for 3-D multicellular systems , 2017, bioRxiv.

[5]  K. Swanson,et al.  A mathematical model for brain tumor response to radiation therapy , 2009, Journal of mathematical biology.

[6]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[7]  Alexander V. Favorov,et al.  CoGAPS matrix factorization algorithm identifies transcriptional changes in AP-2alpha target genes in feedback from therapeutic inhibition of the EGFR network , 2016, Oncotarget.

[8]  Roeland M. H. Merks,et al.  Cellular Potts Modeling of Tumor Growth, Tumor Invasion, and Tumor Evolution , 2013, Front. Oncol..

[9]  Deborah Chasman,et al.  Network-based approaches for analysis of complex biological systems. , 2016, Current opinion in biotechnology.

[10]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[11]  Arpad Kelemen,et al.  Computational dynamic approaches for temporal omics data with applications to systems medicine , 2017, BioData Mining.

[12]  Alyssa C. Frazee,et al.  Polyester: Simulating RNA-Seq Datasets With Differential Transcript Expression , 2014, bioRxiv.

[13]  Thomas E. Yankeelov,et al.  Multi-scale Modeling in Clinical Oncology: Opportunities and Barriers to Success , 2016, Annals of Biomedical Engineering.

[14]  Alexander V. Favorov,et al.  Identifying Context-Specific Transcription Factor Targets From Prior Knowledge and Gene Expression Data , 2013, IEEE Transactions on NanoBioscience.

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  A. Anderson,et al.  Hybrid models of tumor growth , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[17]  D. Drasdo,et al.  Individual-based approaches to birth and death in avascu1ar tumors , 2003 .

[18]  Colleen E. Clancy,et al.  Multiscale Modeling in the Clinic: Drug Design and Development , 2016, Annals of Biomedical Engineering.

[19]  Eduardo D Sontag,et al.  Evaluating optimal therapy robustness by virtual expansion of a sample population, with a case study in cancer immunotherapy , 2017, Proceedings of the National Academy of Sciences.

[20]  Alexander R. A. Anderson,et al.  The role of contact inhibition in intratumoral heterogeneity: An off-lattice individual based model , 2016 .

[21]  Jun Pang,et al.  Recent development and biomedical applications of probabilistic Boolean networks , 2013, Cell Communication and Signaling.

[22]  Lawrence H. Schwartz,et al.  Survival and Death Signals Can Predict Tumor Response to Therapy After Oncogene Inactivation , 2011, Science Translational Medicine.

[23]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[24]  Arpad Kelemen,et al.  Dynamic modeling and network approaches for omics time course data: overview of computational approaches and applications , 2018, Briefings Bioinform..

[25]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[26]  C. Chung,et al.  Gene expression signatures modulated by epidermal growth factor receptor activation and their relationship to cetuximab resistance in head and neck squamous cell carcinoma , 2012, BMC Genomics.

[27]  Alexander V. Favorov,et al.  Integrated time course omics analysis distinguishes immediate therapeutic response from acquired resistance , 2018, Genome Medicine.

[28]  P. Sorger,et al.  Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs , 2016, Nature Methods.

[29]  Ziv Bar-Joseph,et al.  Selecting the most appropriate time points to profile in high-throughput studies , 2017, eLife.