Adaptively capturing the heterogeneity of expression for cancer biomarker identification

BackgroundIdentifying cancer biomarkers from transcriptomics data is of importance to cancer research. However, transcriptomics data are often complex and heterogeneous, which complicates the identification of cancer biomarkers in practice. Currently, the heterogeneity still remains a challenge for detecting subtle but consistent changes of gene expression in cancer cells.ResultsIn this paper, we propose to adaptively capture the heterogeneity of expression across samples in a gene regulation space instead of in a gene expression space. Specifically, we transform gene expression profiles into gene regulation profiles and mathematically formulate gene regulation probabilities (GRPs)-based statistics for characterizing differential expression of genes between tumor and normal tissues. Finally, an unbiased estimator (aGRP) of GRPs is devised that can interrogate and adaptively capture the heterogeneity of gene expression. We also derived an asymptotical significance analysis procedure for the new statistic. Since no parameter needs to be preset, aGRP is easy and friendly to use for researchers without computer programming background. We evaluated the proposed method on both simulated data and real-world data and compared with previous methods. Experimental results demonstrated the superior performance of the proposed method in exploring the heterogeneity of expression for capturing subtle but consistent alterations of gene expression in cancer.ConclusionsExpression heterogeneity largely influences the performance of cancer biomarker identification from transcriptomics data. Models are needed that efficiently deal with the expression heterogeneity. The proposed method can be a standalone tool due to its capacity of adaptively capturing the sample heterogeneity and the simplicity in use.Software availabilityThe source code of aGRP can be downloaded from https://github.com/hqwang126/aGRP.

[1]  Tobias Madsen,et al.  PINCAGE: probabilistic integration of cancer genomics data for perturbed gene identification and sample classification , 2016, Bioinform..

[2]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[3]  David R. Kelley,et al.  Corrigendum: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2014 .

[4]  Dan Chen,et al.  EGFR and Ras regulate DDX59 during lung cancer development. , 2018, Gene.

[5]  Dario Strbenac,et al.  Differential distribution improves gene selection stability and has competitive classification performance for patient survival , 2016, Nucleic acids research.

[6]  Ze Zhang,et al.  Elevated expression of HMGA1 correlates with the malignant status and prognosis of non-small cell lung cancer , 2015, Tumor Biology.

[7]  Rainer Breitling,et al.  A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments , 2008, Bioinform..

[8]  J. Bai,et al.  TRIB1 promotes colorectal cancer cell migration and invasion through activation MMP-2 via FAK/Src and ERK pathways , 2017, Oncotarget.

[9]  Yandong Zhang,et al.  DDX59 promotes DNA replication in lung adenocarcinoma , 2017, Cell Death Discovery.

[10]  Jonathan D. Stallings,et al.  Transcriptional analysis of novel hormone receptors PGRMC1 and PGRMC2 as potential biomarkers of breast adenocarcinoma staging. , 2011, The Journal of surgical research.

[11]  Damon Berridge,et al.  Robust Modeling of Differential Gene Expression Data Using Normal/Independent Distributions: A Bayesian Approach , 2015, PloS one.

[12]  Yusuke Nakamura,et al.  Identification of a nuclear protein, LRRC42, involved in lung carcinogenesis. , 2014, International journal of oncology.

[13]  R. Jenkins,et al.  Genetics of adult glioma. , 2012, Cancer genetics.

[14]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[15]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[16]  Lantao Zhao,et al.  Identification of potential therapeutic targets for lung cancer by bioinformatics analysis , 2015, Molecular medicine reports.

[17]  E. Dmitrovsky,et al.  CDK2 Inhibition Causes Anaphase Catastrophe in Lung Cancer through the Centrosomal Protein CP110. , 2015, Cancer research.

[18]  John D. Minna,et al.  Differential Methylation of a Short CpG-Rich Sequence within Exon 1 of TCF21 Gene: A Promising Cancer Biomarker Assay , 2008, Cancer Epidemiology Biomarkers & Prevention.

[19]  Xing-Ming Zhao,et al.  jNMFMA: a joint non-negative matrix factorization meta-analysis of transcriptomics data , 2015, Bioinform..

[20]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  S. Baker A cancer theory kerfuffle can lead to new lines of research. , 2014, Journal of the National Cancer Institute.

[22]  De-Shuang Huang,et al.  Regulation probability method for gene selection , 2006, Pattern Recognit. Lett..

[23]  Yuan Zhang,et al.  MiR-129-3p promotes docetaxel resistance of breast cancer cells via CP110 inhibition , 2015, Scientific Reports.

[24]  RAINER BREITLING,et al.  Rank-based Methods as a Non-parametric Alternative of the T-statistic for the Analysis of Biological Microarray Data , 2005, J. Bioinform. Comput. Biol..

[25]  M. Guo,et al.  Reversal of cisplatin resistance by inhibiting PI3K/Akt signal pathway in human lung cancer cells. , 2016, Neoplasma.

[26]  Michael Q. Zhang,et al.  Recurrently deregulated lncRNAs in hepatocellular carcinoma , 2017, Nature Communications.

[27]  Changhong Miao,et al.  COL11A1 is overexpressed in recurrent non-small cell lung cancer and promotes cell proliferation, migration, invasion and drug resistance. , 2016, Oncology reports.

[28]  Carrie Cibulskis,et al.  Assigning clinical meaning to somatic and germ-line whole-exome sequencing data in a prospective cancer precision medicine study , 2017, Genetics in Medicine.

[29]  N. McGranahan,et al.  The causes and consequences of genetic heterogeneity in cancer evolution , 2013, Nature.

[30]  R. Guo,et al.  Phosphorylated Akt overexpression and loss of PTEN expression in non-small cell lung cancer confers poor prognosis. , 2006, Lung cancer.

[31]  A. Okamoto,et al.  Genetic alterations and expression of the protein phosphatase 1 genes in human cancers. , 2001, International journal of oncology.

[32]  S. Lessnick,et al.  Protein phosphatase 1 regulatory subunit 1A in ewing sarcoma tumorigenesis and metastasis , 2018, Oncogene.

[33]  Y-H Wu,et al.  COL11A1 promotes tumor progression and predicts poor clinical outcome in ovarian cancer , 2014, Oncogene.

[34]  Andrew H. Beck,et al.  EMDomics: a robust and powerful method for the identification of genes differentially expressed between heterogeneous classes , 2015, Bioinform..

[35]  Qipeng Yuan,et al.  The natural compound sulforaphene, as a novel anticancer reagent, targeting PI3K-AKT signaling pathway in lung cancer , 2016, Oncotarget.

[36]  M. Tsao,et al.  MMS19 as a potential predictive marker of adjuvant chemotherapy benefit in resected non-small cell lung cancer. , 2016, Cancer biomarkers : section A of Disease markers.

[37]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[38]  Joel S. Parker,et al.  Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer , 2016, Bioinform..

[39]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[40]  Ralf Krahe,et al.  Methylation of the candidate biomarker TCF21 is very frequent across a spectrum of early‐stage nonsmall cell lung cancers , 2011, Cancer.

[41]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[42]  Christoph Plass,et al.  Epigenetic regulation of the tumor suppressor gene TCF21 on 6q23-q24 in lung and head and neck cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Hong-Qiang Wang,et al.  SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures , 2011, Bioinform..

[44]  Takuro Nakamura The role of Trib1 in myeloid leukaemogenesis and differentiation. , 2015, Biochemical Society transactions.

[45]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[46]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[47]  M. Wehling,et al.  PGRMC2, a yet uncharacterized protein with potential as tumor suppressor, migration inhibitor, and regulator of cytochrome P450 enzyme activity , 2013, Steroids.

[48]  T. Lagerweij,et al.  miR-129-3p controls centrosome number in metastatic prostate cancer cells by repressing CP110 , 2016, Oncotarget.

[49]  Nancy R. Zhang,et al.  Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing , 2016, Proceedings of the National Academy of Sciences.

[50]  Iya Khalil,et al.  Bayesian Network Inference Modeling Identifies TRIB1 as a Novel Regulator of Cell-Cycle Progression and Survival in Cancer Cells. , 2017, Cancer research.