Data Accuracy's Impact on Segmentation Performance: Benchmarking RFM Analysis, Logistic Regression, and Decision Trees

Companies greatly benefit from knowing how problems with data quality influence the performance of segmentation techniques and which techniques are more robust to these problems than others. This study investigates the influence of problems with data accuracy – an important dimension of data quality – on three prominent segmentation techniques for direct marketing: RFM (recency, frequency, and monetary value) analysis, logistic regression, and decision trees. For two real-life direct marketing data sets analyzed, the results demonstrate that (1) under optimal data accuracy, decision trees are preferred over RFM analysis and logistic regression; (2) the introduction of data accuracy problems deteriorates the performance of all three segmentation techniques; and (3) as data becomes less accurate, decision trees retain superior to logistic regression and RFM analysis. Overall, this study recommends the use of decision trees in the context of customer segmentation for direct marketing, even under the suspicion of data accuracy problems. (This abstract was borrowed from another version of this item.)

[1]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[2]  RadhaKanta Mahapatra,et al.  Business data mining - a machine learning perspective , 2001, Inf. Manag..

[3]  Marno Verbeek,et al.  A Guide to Modern Econometrics , 2000 .

[4]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[5]  David G. Stork,et al.  Pattern Classification , 1973 .

[6]  Mónica Cortiñas,et al.  Understanding multi-channel banking customers , 2010 .

[7]  P E T R BERKA,et al.  empirical Comparison of Various Discretization Procedures , 1998, Int. J. Pattern Recognit. Artif. Intell..

[8]  Annette J. Dobson,et al.  An introduction to generalized linear models , 1991 .

[9]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[10]  Sven F. Crone,et al.  The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing , 2006, Eur. J. Oper. Res..

[11]  William E. Griffiths,et al.  Principles of Econometrics , 2008 .

[12]  Jan Vanthienen,et al.  50 years of data mining and OR: upcoming trends and challenges , 2009, J. Oper. Res. Soc..

[13]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[14]  David Luna,et al.  Predictive Segmentation in Action - Using CHAID to Segment Loyalty Card Holders , 2006 .

[15]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[16]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[18]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[19]  John Francis Kros,et al.  A comparison of imputation methods in the presence of imprecise data when employing a neural network s-Sigmoid function , 2007, Int. J. Bus. Intell. Data Min..

[20]  Nissan Levin,et al.  Predictive modeling using segmentation , 2001 .

[21]  Pradeep K. Korgaonkar,et al.  Direct marketing attitudes , 1995 .

[22]  Dale Miller,et al.  Brand morphing across Wal-Mart customer segments , 2010 .

[23]  Eunju Ko,et al.  Organizational characteristics and the CRM adoption process , 2008 .

[24]  Richard A. Parker,et al.  Designing and Conducting Survey Research: A Comprehensive Guide , 1992 .

[25]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[26]  David Rogosa,et al.  Comparing nonparallel regression lines. , 1980 .

[27]  Hon-Kwong Lui,et al.  Machine Learning for Direct Marketing Response Models: Bayesian Networks with Evolutionary Programming , 2006, Manag. Sci..

[28]  Wagner A. Kamakura,et al.  Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models , 2006 .

[29]  John A. McCarty,et al.  SEGMENTATION APPROACHES IN DATA MINING: A COMPARISON OF RFM, CHAID, AND LOGISTIC REGRESSION , 2007 .

[30]  Nissan Levin,et al.  Issues and problems in applying neural computing to target marketing , 1997 .

[31]  John Mingers,et al.  Neural Networks, Decision Tree Induction and Discriminant Analysis: an Empirical Comparison , 1994 .

[32]  Hennie A. M. Daniels,et al.  Methodological and practical aspects of data mining , 2000, Inf. Manag..

[33]  Jorge Casillas,et al.  Marketing Intelligent Systems for consumer behaviour modelling by a descriptive induction approach based on Genetic Fuzzy Systems , 2009 .

[34]  Ephraim R. McLean,et al.  Information Systems Success: The Quest for the Dependent Variable , 1992, Inf. Syst. Res..

[35]  Johan A. K. Suykens,et al.  Knowledge discovery in a direct marketing case using least squares support vector machines , 2001, Int. J. Intell. Syst..

[36]  Susana V. Mondschein,et al.  Mailing Decisions in the Catalog Sales Industry , 1996 .

[37]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[38]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[39]  Dominique Haughton,et al.  Direct marketing modeling with CART and CHAID , 1997 .

[40]  F. Ramsey,et al.  The statistical sleuth : a course in methods of data analysis , 2002 .

[41]  John Fernie,et al.  Mail Order Direct Marketing in the United States and the United Kingdom: Responses to Changing Market Conditions , 1999 .

[42]  Alan Matsumura,et al.  Competing with Quality Information , 1996, IQ.

[43]  Kristof Coussement,et al.  Integrating the voice of customers through call center emails into a decision support system for churn prediction , 2008, Inf. Manag..

[44]  Filippo Menczer,et al.  Customer Targeting: A Neural Network Approach Guided by Genetic Algorithms , 2005, Manag. Sci..

[45]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[46]  Robert W. Zmud,et al.  AN EMPIRICAL INVESTIGATION OF THE DIMENSIONALITY OF THE CONCEPT OF INFORMATION , 1978 .

[47]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[48]  Edward C. Malthouse,et al.  Assessing the performance of direct marketing scoring models , 2001 .

[49]  Thomas C. Redman,et al.  Data Quality Management and Technology , 1992 .

[50]  Nissan Levin,et al.  Applying neural computing to target marketing , 1997 .

[51]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[52]  Arthur Middleton Hughes The Complete Database Marketer: Second Generation Strategies and Techniques for Tapping the Power of Your Customer Database , 1995 .

[53]  J. Suykens,et al.  Linear and Non-linear Credit Scoring by Combining Logistic Regression and Support Vector Machines , 2006 .

[54]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[55]  Chengqi Zhang,et al.  Data preparation for data mining , 2003, Appl. Artif. Intell..

[56]  Adir Even,et al.  Evaluating a model for cost-effective data quality management in a real-world CRM setting , 2010, Decis. Support Syst..

[57]  Sunil Gupta,et al.  Brand Choice, Purchase Incidence, and Segmentation: An Integrated Modeling Approach , 1992 .

[58]  Ingoo Han,et al.  The impact of measurement scale and correlation structure on classification performance of inductive learning and statistical methods , 1996 .

[59]  Arthur Middleton Hughes,et al.  Strategic database marketing , 2005 .

[60]  Robert C. Blattberg,et al.  Database Marketing: Analyzing and Managing Customers , 2008 .