Uplift modeling for clinical trial data

Traditional classification methods predict the class probability distribution conditional on a set of predictor variables. Uplift modeling, in contrast, tries to predict the difference between class probabilities in the treatment group (on which some action has been taken) and the control group (not subjected to the action) such that the model predicts the net effect of the action. Such an approach seems to be well suited to analysis of clinical trial data and to allow for discovering groups of patients for which the treatment is most beneficial. One of the purposes of this paper is to verify this claim experimentally. Additionally, we present an approach to uplift modeling which allows for application of standard probabilistic classification models, such as logistic regression, in the uplift setting. Further, we extend the approach such that standard classification models built on the treatment and control datasets can be incorporated in a manner similar to semisupervised learning in order to improve prediction accuracy. The usefulness of both approaches has been verified experimentally on publicly available clinical trial data.

[1]  J. Robins Correcting for non-compliance in randomized trials using structural nested mean models , 1994 .

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Patrick D. Surry,et al.  Differential Response Analysis: Modeling True Responses by Isolating the Effect of a Single Action , 1999 .

[4]  David Maxwell Chickering,et al.  A Decision Theoretic Approach to Targeted Advertising , 2000, UAI.

[5]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[6]  Behram Hansotia,et al.  Incremental value modeling , 2002 .

[7]  T. Panzarella,et al.  A randomized multicenter comparison of bone marrow and peripheral blood in recipients of matched sibling allogeneic transplants for myeloid malignancies. , 2002, Blood.

[8]  Victor S. Y. Lo The true lift model: a novel data mining approach to response modeling in database marketing , 2002, SKDD.

[9]  Stijn Vansteelandt,et al.  Causal inference with generalized structural mean models , 2003 .

[10]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[11]  J. Robins,et al.  Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models , 2004 .

[12]  Melania Pintilie,et al.  Competing Risks: A Practical Perspective , 2006 .

[13]  Szymon Jaroszewicz,et al.  Decision Trees for Uplift Modeling , 2010, 2010 IEEE International Conference on Data Mining.

[14]  Szymon Jaroszewicz,et al.  Decision trees for uplift modeling with single and multiple treatments , 2011, Knowledge and Information Systems.

[15]  Patrick D. Surry,et al.  Real-World Uplift Modelling with Significance-Based Uplift Trees , 2012 .