Design principles of massive, robust prediction systems

Most data mining research is concerned with building high-quality classification models in isolation. In massive production systems, however, the ability to monitor and maintain performance over time while growing in size and scope is equally important. Many external factors may degrade classification performance including changes in data distribution, noise or bias in the source data, and the evolution of the system itself. A well-functioning system must gracefully handle all of these. This paper lays out a set of design principles for large-scale autonomous data mining systems and then demonstrates our application of these principles within the m6d automated ad targeting system. We demonstrate a comprehensive set of quality control processes that allow us monitor and maintain thousands of distinct classification models automatically, and to add new models, take on new data, and correct poorly-performing models without manual intervention or system disruption.

[1]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[2]  Robert L. Grossman,et al.  Detecting changes in large data sets of payment card data: a case study , 2007, KDD '07.

[3]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[4]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[5]  Avi Goldfarb,et al.  Online Display Advertising: Targeting and Obtrusiveness , 2011, Mark. Sci..

[6]  Foster Provost,et al.  Audience selection for on-line brand advertising: privacy-friendly social network targeting , 2009, KDD.

[7]  Hans R. Stoll,et al.  Electronic Trading in Stock Markets , 2006 .

[8]  Tom Fawcett,et al.  PAV and the ROC convex hull , 2007, Machine Learning.

[9]  Jose Garcia Moreno-Torres,et al.  Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis , 2013, Inf. Sci..

[10]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[11]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[12]  Rong Ge,et al.  Evaluating online ad campaigns in a pipeline: causal models at scale , 2010, KDD.

[13]  J. Heckman Sample selection bias as a specification error , 1979 .

[14]  Chih-Ping Wei,et al.  Turning telecommunications call details to churn prediction: a data mining approach , 2002, Expert Syst. Appl..

[15]  D. Sculley,et al.  Detecting adversarial advertisements in the wild , 2011, KDD.

[16]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[17]  Yan Liu,et al.  Breast cancer identification: KDD CUP winner's report , 2008, SKDD.

[18]  Rayid Ghani,et al.  Interactive learning for efficiently detecting errors in insurance claims , 2011, KDD.

[19]  Nitesh V. Chawla,et al.  Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains , 2011, J. Artif. Intell. Res..