False Discovery Rate Control in Cancer Biomarker Selection Using Knockoffs

The discovery of biomarkers that are informative for cancer risk assessment, diagnosis, prognosis and treatment predictions is crucial. Recent advances in high-throughput genomics make it plausible to select biomarkers from the vast number of human genes in an unbiased manner. Yet, control of false discoveries is challenging given the large number of genes versus the relatively small number of patients in a typical cancer study. To ensure that most of the discoveries are true, we employ a knockoff procedure to control false discoveries. Our method is general and flexible, accommodating arbitrary covariate distributions, linear and nonlinear associations, and survival models. In simulations, our method compares favorably to the alternatives; its utility of identifying important genes in real clinical applications is demonstrated by the identification of seven genes associated with Breslow thickness in skin cutaneous melanoma patients.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Yong Cai,et al.  Lentiviral-mediated overexpression of KCTD12 inhibits the proliferation of human uveal melanoma OCM-1 cells. , 2017, Oncology reports.

[3]  Yingyao Zhou,et al.  A small interfering RNA screen for modulators of tumor cell motility identifies MAP4K4 as a promigratory kinase. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Karsten M. Borgwardt,et al.  Faculty Opinions recommendation of Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[7]  B. Efron Estimation and Accuracy After Model Selection , 2014, Journal of the American Statistical Association.

[8]  Kathleen R. Cho,et al.  Characterization of novel human ovarian cancer-specific transcripts (HOSTs) identified by serial analysis of gene expression , 2003, Oncogene.

[9]  A. Vasiliev,et al.  Novel RNA biomarkers of prostate cancer revealed by RNA-seq analysis of formalin-fixed samples obtained from Russian patients , 2017, Oncotarget.

[10]  Hongbing Shen,et al.  Genome-wide analysis of expression quantitative trait loci identified potential lung cancer susceptibility variants among Asian populations , 2019, Carcinogenesis.

[11]  Thomas L. Dunwell,et al.  A Genome-wide screen identifies frequently methylated genes in haematological and epithelial cancers , 2010, Molecular Cancer.

[12]  H. Cordell,et al.  SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[13]  V. Treviño,et al.  A robust biomarker of differential correlations improves the diagnosis of cytologically indeterminate thyroid cancers. , 2016, International journal of molecular medicine.

[14]  Stanley R Hamilton,et al.  Expression of MAP4K4 Is Associated with Worse Prognosis in Patients with Stage II Pancreatic Ductal Adenocarcinoma , 2008, Clinical Cancer Research.

[15]  W. Catalona,et al.  Measurement of prostate-specific antigen in serum as a screening test for prostate cancer. , 1991, The New England journal of medicine.

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Yang Li,et al.  Network-based co-expression analysis for exploring the potential diagnostic biomarkers of metastatic melanoma , 2018, PloS one.

[18]  P. Helmbold,et al.  Claudin11 Promoter Hypermethylation Is Frequent in Malignant Melanoma of the Skin, but Uncommon in Nevus Cell Nevi , 2015, Cancers.

[19]  Dar-Ren Chen,et al.  Significant elevation of CLDN16 and HAPLN3 gene expression in human breast cancer. , 2010, Oncology reports.

[20]  A. Patiño-García,et al.  Profiling of Chemonaive Osteosarcoma and Paired-Normal Cells Identifies EBF2 as a Mediator of Osteoprotegerin Inhibition to Tumor Necrosis Factor–Related Apoptosis-Inducing Ligand–Induced Apoptosis , 2009, Clinical Cancer Research.

[21]  D.,et al.  Regression Models and Life-Tables , 2022 .

[22]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[23]  Q. Li,et al.  Role of differentially expressed genes and long non‐coding RNAs in papillary thyroid carcinoma diagnosis, progression, and prognosis , 2018, Journal of cellular biochemistry.

[24]  T. Fleming,et al.  Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. , 2001, The New England journal of medicine.

[25]  J. Goeman,et al.  Genome‐wide promoter methylation analysis identifies epigenetic silencing of MAPK13 in primary cutaneous melanoma , 2013, Pigment cell & melanoma research.

[26]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[27]  Yongsheng Huang,et al.  A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. , 2006, Blood.

[28]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[29]  Xiaoping Zhou,et al.  False discovery control for penalized variable selections with high-dimensional covariates , 2018, Statistical applications in genetics and molecular biology.

[30]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[31]  Xiang Zhou,et al.  Differential expression analysis for RNAseq using Poisson mixed models , 2016, bioRxiv.

[32]  Steven E. Bayer,et al.  A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. , 1994, Science.

[33]  Ennis,et al.  Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. , 2001, The New England journal of medicine.

[34]  S. Gabriel,et al.  EGFR Mutations in Lung Cancer: Correlation with Clinical Response to Gefitinib Therapy , 2004, Science.