Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival

Motivation: The conventional approach to personalized medicine relies on molecular data analytics across multiple patients. The path to precision medicine lies with molecular data analytics that can discover interpretable single-subject signals (N-of-1). We developed a global framework, N-of-1-pathways, for a mechanistic-anchored approach to single-subject gene expression data analysis. We previously employed a metric that could prioritize the statistical significance of a deregulated pathway in single subjects, however, it lacked in quantitative interpretability (e.g. the equivalent to a gene expression fold-change). Results: In this study, we extend our previous approach with the application of statistical Mahalanobis distance (MD) to quantify personal pathway-level deregulation. We demonstrate that this approach, N-of-1-pathways Paired Samples MD (N-OF-1-PATHWAYS-MD), detects deregulated pathways (empirical simulations), while not inflating false-positive rate using a study with biological replicates. Finally, we establish that N-OF-1-PATHWAYS-MD scores are, biologically significant, clinically relevant and are predictive of breast cancer survival (P < 0.05, n = 80 invasive carcinoma; TCGA RNA-sequences). Conclusion: N-of-1-pathways MD provides a practical approach towards precision medicine. The method generates the magnitude and the biological significance of personal deregulated pathways results derived solely from the patient’s transcriptome. These pathways offer the opportunities for deriving clinically actionable decisions that have the potential to complement the clinical interpretability of personal polymorphisms obtained from DNA acquired or inherited polymorphisms and mutations. In addition, it offers an opportunity for applicability to diseases in which DNA changes may not be relevant, and thus expand the ‘interpretable ‘omics’ of single subjects (e.g. personalome). Availability and implementation: http://www.lussierlab.net/publications/N-of-1-pathways. Contact: yves@email.arizona.edu or piegorsch@math.arizona.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[2]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[3]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[6]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[7]  Michael R Chernick,et al.  Bootstrap Methods: A Guide for Practitioners and Researchers , 2007 .

[8]  Yves A. Lussier,et al.  Translating Mendelian and complex inheritance of Alzheimer's disease genes for predicting unique personal genome variants , 2012, J. Am. Medical Informatics Assoc..

[9]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[10]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[11]  Jie Zhou,et al.  RNA-seq differential expression studies: more sequence or more replication? , 2014, Bioinform..

[12]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[14]  Mathew W. Wright,et al.  The HUGO Gene Nomenclature Committee (HGNC) , 2001, Human Genetics.

[15]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[16]  Yves A Lussier,et al.  Concordance of deregulated mechanisms unveiled in underpowered experiments: PTBP1 knockdown case study , 2014, BMC Medical Genomics.

[17]  Yves A. Lussier,et al.  Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory , 2012, J. Am. Medical Informatics Assoc..

[18]  Yves A. Lussier,et al.  Protein interaction network underpins concordant prognosis among heterogeneous breast cancer signatures , 2010, J. Biomed. Informatics.

[19]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[20]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[21]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[22]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[24]  Adebowale Adeyemo,et al.  Reconciling clinical importance and statistical significance , 2014, European Journal of Human Genetics.

[25]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Ian T. Foster,et al.  ‘N-of-1-pathways’ unveils personal deregulated mechanisms from a single pair of RNA-Seq samples: towards precision medicine , 2014, J. Am. Medical Informatics Assoc..

[28]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[29]  Richard Simon,et al.  Roadmap for developing and validating therapeutically relevant genomic classifiers. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  P. Rousseeuw,et al.  Partitioning Around Medoids (Program PAM) , 2008 .

[31]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.