The effect of sample size and proportion of buyers in the sample on the performance of list segmentation equations generated by regression analysis

Abstract List segmentation refers to the set of techniques employed by direct mail marketers to attempt to predict those specific individuals who are more likely than others on a list to respond to a specific direct mail solicitation. Regression analysis is a commonly used list Segmentation technique which yields managerially useful results. However, the commonly accepted criteria for evaluating the “goodness” of regression equations (such as adjusted or unadjusted R2) are inappropriate in the list segmentation context. The authors describe a superior evaluation criterion—the Pareto Prediction Criterion—and report on a study in which a series of A/B (scoring equation generation sample/holdout evaluation sample) experiments are conducted, with both the sample size and proportion of buyers of each sample (A and B) varying. The results of our study indicate that when the response rate is held constant, the sample size does have a significant impact on the segmentation's performance. Moreover. “salting the data” (artificially inflating the response rate in the sample, resulting in a data set to be analyzed with higher proportion of buyers than in actuality) has a positive effect on segmentation performance when applied to “A,” the data set from which the scoring equation is generated, but has no effect on segmentation performance when applied to “B,” the holdout sample.