Decision-Centric Active Learning of Binary-Outcome Models

It can be expensive to acquire the data required for businesses to employ data-driven predictive modeling---for example, to model consumer preferences to optimize targeting. Prior research has introduced “active-learning” policies for identifying data that are particularly useful for model induction, with the goal of decreasing the statistical error for a given acquisition cost (error-centric approaches). However, predictive models are used as part of a decision-making process, and costly improvements in model accuracy do not always result in better decisions. This paper introduces a new approach for active data acquisition that specifically targets decision making. The new decision-centric approach departs from traditional active learning by placing emphasis on acquisitions that are more likely to affect decision making. We describe two different types of decision-centric techniques. Next, using direct-marketing data, we compare various data-acquisition techniques. We demonstrate that strategies for reducing statistical error can be wasteful in a decision-making context, and show that one decision-centric technique in particular can improve targeting decisions significantly. We also show that this method is robust in the face of decreasing quality of utility estimations, eventually converging to uniform random sampling, and that it can be extended to situations where different data acquisitions have different costs. The results suggest that businesses should consider modifying their strategies for acquiring information through normal business transactions. For example, a firm such as Amazon.com that models consumer preferences for customized marketing may accelerate learning by proactively offering recommendations---not merely to induce immediate sales, but for improving recommendations in the future.

[1]  Limsoon Wong,et al.  DATA MINING TECHNIQUES , 2003 .

[2]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[3]  Herbert A. Simon,et al.  Problem solving and rule induction: A unified view. , 1974 .

[4]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[5]  Vijay S. Mookerjee Debiasing Training Data for Inductive Expert System Construction , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[7]  Laura J. Kornish Technology choice and timing with positive network effects , 2006, Eur. J. Oper. Res..

[8]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[9]  Peter S. Fader,et al.  Dynamic Conversion Behavior at E-Commerce Sites , 2004, Manag. Sci..

[10]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[11]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[12]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[13]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[14]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  J. Heckman Sample selection bias as a specification error , 1979 .

[17]  W. J. Studden,et al.  Optimal Experimental Designs , 1966 .

[18]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[19]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[20]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[21]  Foster Provost,et al.  Active Learning for Decision Making , 2004 .

[22]  Naoki Abe,et al.  Sequential cost-sensitive decision making with reinforcement learning , 2002, KDD.

[23]  Ke Wang,et al.  Mining Customer Value: From Association Rules to Direct Marketing , 2005, Data Mining and Knowledge Discovery.

[24]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[25]  Saharon Rosset,et al.  Customer Lifetime Value Models for Decision Support , 2003, Data Mining and Knowledge Discovery.

[26]  Patrick L. Brockett,et al.  A Comparative Analysis of Neural Networks and Statistical Methods for Predicting Consumer Choice , 1997 .

[27]  K. McCardle Information Acquisition and the Adoption of New Technology , 1985 .

[28]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[29]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[30]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[31]  Foster J. Provost,et al.  An expected utility approach to active feature-value acquisition , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[32]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[33]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..