Ordering of Omics Features Using Beta Distributions on Montecarlo p-Values

The current trend in genetic research is the study of omics data as a whole, either combining studies or omics techniques. This raises the need for new robust statistical methods that can integrate and order the relevant biological information. A good way to approach the problem is to order the features studied according to the different kinds of data so a key point is to associate good values to the features that permit us a good sorting of them. These values are usually the p-values corresponding to a hypothesis which has been tested for each feature studied. The Montecarlo method is certainly one of the most robust methods for hypothesis testing. However, a large number of simulations is needed to obtain a reliable p-value, so the method becomes computationally infeasible in many situations. We propose a new way to order genes according to their differential features by using a score defined from a beta distribution fitted to the generated p-values. Our approach has been tested using simulated data and colorectal cancer datasets from Infinium methylationEPIC array, Affymetrix gene expression array and Illumina RNA-seq platforms. The results show that this approach allows a proper ordering of genes using a number of simulations much lower than with the Montecarlo method. Furthermore, the score can be interpreted as an estimated p-value and compared with Montecarlo and other approaches like the p-value of the moderated t-tests. We have also identified a new expression pattern of eighteen genes common to all colorectal cancer microarrays, i.e., 21 datasets. Thus, the proposed method is effective for obtaining biological results using different datasets. Our score shows a slightly smaller type I error for small sizes than the Montecarlo p-value. The type II error of Montecarlo p-value is lower than the one obtained with the proposed score and with a moderated p-value, but these differences are highly reduced for larger sample sizes and higher false discovery rates. Similar performances from type I and II errors and the score enable a clear ordering of the features being evaluated.

[1]  C. Moskaluk,et al.  Integrated genomic analysis of colorectal cancer progression reveals activation of EGFR through demethylation of the EREG promoter , 2016, Oncogene.

[2]  Hiroshi Tanaka,et al.  Clinical Significance of Osteoprotegerin Expression in Human Colorectal Cancer , 2011, Clinical Cancer Research.

[3]  Sheng Zhong,et al.  Reproducibility Probability Score—incorporating measurement variability across laboratories for gene selection , 2006, Nature Biotechnology.

[4]  Caixia Cheng,et al.  Radiogenomic Analysis of F-18-Fluorodeoxyglucose Positron Emission Tomography and Gene Expression Data Elucidates the Epidemiological Complexity of Colorectal Cancer Landscape , 2019, Computational and structural biotechnology journal.

[5]  W. Xue,et al.  PRKAR2B promotes prostate cancer metastasis by activating Wnt/β‐catenin and inducing epithelial‐mesenchymal transition , 2018, Journal of cellular biochemistry.

[6]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[7]  Michal A. Kurowski,et al.  Transcriptome Profile of Human Colorectal Adenomas , 2007, Molecular Cancer Research.

[8]  R. Sanz-Pamplona,et al.  Colon-specific eQTL analysis to inform on functional SNPs , 2018, British Journal of Cancer.

[9]  Z. Weng,et al.  Analysis of Microarray and RNA-seq Expression Profiling Data. , 2017, Cold Spring Harbor protocols.

[10]  Jing Li,et al.  Differentially expressed lncRNAs and mRNAs identified by NGS analysis in colorectal cancer patients , 2018, Cancer medicine.

[11]  R. Versteeg,et al.  Mutations in the Ras–Raf Axis Underlie the Prognostic Value of CD133 in Colorectal Cancer , 2012, Clinical Cancer Research.

[12]  Z. Jehan,et al.  Genome-wide expression analysis of Middle Eastern colorectal cancer reveals FOXM1 as a novel target for cancer therapy. , 2011, The American journal of pathology.

[13]  E. Vakiani,et al.  Logarithmic expansion of LGR5+ cells in human colorectal cancer. , 2018, Cellular signalling.

[14]  Hiroshi Tanaka,et al.  Screening for epigenetically masked genes in colorectal cancer Using 5-Aza-2'-deoxycytidine, microarray and gene expression profile. , 2012, Cancer genomics & proteomics.

[15]  D. Hu,et al.  Chemokine (C-X-C motif) ligand 1 is associated with tumor progression and poor prognosis in patients with colorectal cancer , 2018, Bioscience reports.

[16]  I. Mohamed Rose,et al.  Genome-Wide Open Chromatin Methylome Profiles in Colorectal Cancer , 2020, Biomolecules.

[17]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[18]  Claire Elayne Bangerter Owen Parameter Estimation for the Beta Distribution , 2008 .

[19]  A. Kalmár,et al.  Dysplasia-Carcinoma Transition Specific Transcripts in Colonic Biopsy Samples , 2012, PloS one.

[20]  Quan Chen,et al.  Finding Genetic Overlaps Among Diseases Based on Ranked Gene Lists , 2015, J. Comput. Biol..

[21]  Dong-Hyung Cho,et al.  A nineteen gene‐based risk score classifier predicts prognosis of colorectal cancer patients , 2014, Molecular oncology.

[22]  T. Triche,et al.  Quantitative expression profiling in formalin-fixed paraffin-embedded samples by affymetrix microarrays. , 2010, The Journal of molecular diagnostics : JMD.

[23]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[24]  B. Molnár,et al.  Reversal of gene expression changes in the colorectal normal-adenoma pathway by NS398 selective COX2 inhibitor , 2010, British Journal of Cancer.

[25]  Eric Bair,et al.  Identification of significant features in DNA microarray data , 2013, Wiley interdisciplinary reviews. Computational statistics.

[26]  Caroline Mollevi,et al.  Specific Extracellular Matrix Remodeling Signature of Colon Hepatic Metastases , 2013, PloS one.

[27]  Thomas Downey,et al.  A ‘metastasis-prone’ signature for early-stage mismatch-repair proficient sporadic colorectal cancer patients and its implications for possible therapeutics , 2010, Clinical & Experimental Metastasis.

[28]  Christian P. Robert,et al.  Introducing Monte Carlo Methods with R , 2009 .

[29]  Hiroshi Tanaka,et al.  MUC12 mRNA expression is an independent marker of prognosis in stage II and stage III colorectal cancer , 2010, International journal of cancer.

[30]  G. A. Barnard,et al.  Discussion of Professor Bartlett''s paper , 1963 .

[31]  D. Curran‐Everett,et al.  The fickle P value generates irreproducible results , 2015, Nature Methods.

[32]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[33]  P. Delvenne,et al.  Metastatic colorectal cancer cells maintain the TGFβ program and use TGFBI to fuel angiogenesis , 2021, Theranostics.

[34]  Julie Goodman,et al.  Germline variation in NCF4, an innate immunity gene, is associated with an increased risk of colorectal cancer , 2014, International journal of cancer.

[35]  Krzysztof Goryca,et al.  Modeling Oncogenic Signaling in Colon Tumors by Multidirectional Analyses of Microarray Data Directed for Maximization of Analytical Reliability , 2010, PloS one.

[36]  Sampsa Hautaniemi,et al.  Candidate driver genes in microsatellite‐unstable colorectal cancer , 2012, International journal of cancer.

[37]  Anne-Laure Boulesteix,et al.  Stability and aggregation of ranked gene lists , 2009, Briefings Bioinform..

[38]  F. Wang,et al.  Clinical correlation of B7-H3 and B3GALT4 with the prognosis of colorectal cancer , 2018, World journal of gastroenterology.