Assessing the incremental value of new biomarkers based on OR rules.

In early detection of disease, a single biomarker often has inadequate classification performance, making it important to identify new biomarkers to combine with the existing marker for improved performance. A biologically natural method for combining biomarkers is to use logic rules, e.g., the OR/AND rules. In our motivating example of early detection of pancreatic cancer, the established biomarker CA19-9 is only present in a subclass of cancers; it is of interest to identify new biomarkers present in the other subclasses and declare disease when either marker is positive. While there has been research on developing biomarker combinations using the OR/AND rules, inference regarding the incremental value of the new marker within this framework is lacking and challenging due to statistical non-regularity. In this article, we aim to answer the inferential question of whether combining the new biomarker achieves better classification performance than using the existing biomarker alone, based on a nonparametrically estimated OR rule that maximizes the weighted average of sensitivity and specificity. We propose and compare various procedures for testing the incremental value of the new biomarker and constructing its confidence interval, using bootstrap, cross-validation, and a novel fuzzy p-value-based technique. We compare the performance of different methods via extensive simulation studies and apply them to the pancreatic cancer example.

[1]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[2]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[3]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[4]  Ziding Feng,et al.  Classification versus association models: Should the same methods apply? , 2010, Scandinavian journal of clinical and laboratory investigation. Supplementum.

[5]  Marshall Bern,et al.  Glycan Motif Profiling Reveals Plasma Sialyl-Lewis X Elevations in Pancreatic Cancers That Are Negative for Sialyl-Lewis A* , 2015, Molecular & Cellular Proteomics.

[6]  Margaret Sullivan Pepe,et al.  Combining Several Screening Tests: Optimality of the Risk Score , 2002, Biometrics.

[7]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[8]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[9]  Glen D Meeden,et al.  Fuzzy and randomized confidence intervals and P-values , 2005 .

[10]  Charles Kooperberg,et al.  Combining biomarkers to detect disease with application to prostate cancer. , 2003, Biostatistics.

[11]  Eric B. Laber,et al.  Tree-based methods for individualized treatment regimes. , 2015, Biometrika.

[12]  Peter H Gann,et al.  Strategies combining total and percent free prostate specific antigen for detecting prostate cancer: a prospective evaluation. , 2002, The Journal of urology.

[13]  Marie Davidian,et al.  Using decision lists to construct interpretable and parsimonious treatment regimes , 2015, Biometrics.

[14]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[15]  S G Baker,et al.  Identifying Combinations of Cancer Markers for Further Study as Triggers of Early Intervention , 2000, Biometrics.