kMEn: Analyzing noisy and bidirectional transcriptional pathway responses in single subjects

MOTIVATION Understanding dynamic, patient-level transcriptomic response to therapy is an important step forward for precision medicine. However, conventional transcriptome analysis aims to discover cohort-level change, lacking the capacity to unveil patient-specific response to therapy. To address this gap, we previously developed two N-of-1-pathways methods, Wilcoxon and Mahalanobis distance, to detect unidirectionally responsive transcripts within a pathway using a pair of samples from a single subject. Yet, these methods cannot recognize bidirectionally (up and down) responsive pathways. Further, our previous approaches have not been assessed in presence of background noise and are not designed to identify differentially expressed mRNAs between two samples of a patient taken in different contexts (e.g. cancer vs non cancer), which we termed responsive transcripts (RTs). METHODS We propose a new N-of-1-pathways method, k-Means Enrichment (kMEn), that detects bidirectionally responsive pathways, despite background noise, using a pair of transcriptomes from a single patient. kMEn identifies transcripts responsive to the stimulus through k-means clustering and then tests for an over-representation of the responsive genes within each pathway. The pathways identified by kMEn are mechanistically interpretable pathways significantly responding to a stimulus. RESULTS In ∼9000 simulations varying six parameters, superior performance of kMEn over previous single-subject methods is evident by: (i) improved precision-recall at various levels of bidirectional response and (ii) lower rates of false positives (1-specificity) when more than 10% of genes in the genome are differentially expressed (background noise). In a clinical proof-of-concept, personal treatment-specific pathways identified by kMEn correlate with therapeutic response (p-value<0.01). CONCLUSION Through improved single-subject transcriptome dynamics of bidirectionally-regulated signals, kMEn provides a novel approach to identify mechanism-level biomarkers.

[1]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Jie Zhou,et al.  RNA-seq differential expression studies: more sequence or more replication? , 2014, Bioinform..

[4]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[5]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[6]  F. Clavel,et al.  HIV Drug Resistance , 2000, The New England journal of medicine.

[7]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[8]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[9]  Yves A. Lussier,et al.  Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory , 2012, J. Am. Medical Informatics Assoc..

[11]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[12]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[13]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[14]  Yves A. Lussier,et al.  Analysis of aggregated cell–cell statistical distances within pathways unveils therapeutic-resistance mechanisms in circulating tumor cells , 2016, Bioinform..

[15]  G. Smyth,et al.  Camera: a competitive gene set test accounting for inter-gene correlation , 2012, Nucleic acids research.

[16]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[17]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[18]  H. Abdi,et al.  Principal component analysis , 2010 .

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Lorenzo L. Pesce,et al.  Integrative genomics analyses unveil downstream biological effectors of disease-specific polymorphisms buried in intergenic regions , 2016, npj Genomic Medicine.

[21]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Yves A. Lussier,et al.  Translating Mendelian and complex inheritance of Alzheimer's disease genes for predicting unique personal genome variants , 2012, J. Am. Medical Informatics Assoc..

[24]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[25]  Yves A. Lussier,et al.  Towards a PBMC "virogram assay" for precision medicine: Concordance between ex vivo and in vivo viral infection transcriptomes , 2015, J. Biomed. Informatics.

[26]  Ian T. Foster,et al.  ‘N-of-1-pathways’ unveils personal deregulated mechanisms from a single pair of RNA-Seq samples: towards precision medicine , 2014, J. Am. Medical Informatics Assoc..

[27]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[28]  Douglas D. Richman,et al.  Differential gene expression in HIV-infected individuals following ART. , 2013, Antiviral research.

[29]  Neil Bahroos,et al.  ARTS: automated randomization of multiple traits for study design , 2014, Bioinform..

[30]  D. Bottomly,et al.  Comparison of methods to identify aberrant expression patterns in individual patients: augmenting our toolkit for precision medicine , 2013, Genome Medicine.

[31]  J Leibowitch,et al.  Positive effects of combined antiretroviral therapy on CD4+ T cell homeostasis and function in advanced HIV disease. , 1997, Science.

[32]  Yves A. Lussier,et al.  Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival , 2015, Bioinform..

[33]  Yong Huang,et al.  Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer , 2012, PLoS Comput. Biol..

[34]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[35]  Yves A Lussier,et al.  Concordance of deregulated mechanisms unveiled in underpowered experiments: PTBP1 knockdown case study , 2014, BMC Medical Genomics.

[36]  R. Fisher,et al.  Statistical Methods for Research Workers , 1930, Nature.

[37]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .