Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data

Contents Introduction The Personal Computer and Statistics Statistics and Data Analysis EDA The EDA Paradigm EDA Weaknesses Small and Big Data Data Mining Paradigm Statistics and Machine Learning Statistical Learning References Two Simple Data Mining Methods for Variable Assessment Correlation Coefficient Scatterplots Data Mining Smoothed Scatterplot General Association Test Summary References Logistic Regression: The Workhorse of Database Response Modeling Logistic Regression Model Case Study Logits and Logit Plots The Importance of Straight Data Re-expressing for Straight Data Straight Data for Case Study Techniques When Bulging Rule Does Not Apply Re-expressing MOS_OPEN Assessing the Importance of Variables Important Variables for Case Study Relative Importance of the Variables Best Subset of Variables for Case Study Visual Indicators of Goodness of Model Predictions Evaluating the Data Mining Work Smoothing a Categorical Variable Additional Data Mining Work for Case Study Summary Ordinary Regression: The Workhorse of Database Profit Modeling Ordinary Regression Model Illustration Mini Case Study Important Variables for Mini Case Study Best Subset of Variable for Case Study Summary CHAID for Interpreting a Logistic Regression Model Logistic Regression Model Database Marketing Response Model Case Study CHAID Multivariable CHAID Trees CHAID Market Segmentation CHAID Tree Graphs Summary The Importance of the Regression Coefficient The Ordinary Regression Model Four Questions Important Predictor Variables P-values and BIG Data Returning to Question #1 Predictor Variable's Effect On Prediction The Caveat Returning to Question #2 Ranking Predictor Variables By Effect On Prediction Returning to Question #3 Returning to Question #46.12 Summary Reference The Predictive Contribution Coefficient: A Measure of Predictive Importance Background Illustration of Decision Rule, Predictive Contribution Coefficient Calculation of Predictive Contribution Coefficient Extra Illustration of Predictive Contribution Coefficient Summary Reference CHAID for Specifying a Model with Interaction Variables Interaction Variables Strategy for Modeling with Interaction Variables Strategy Based on the Notion of a Special Point Example of a Response Model with an Interaction Variable CHAID for Uncovering Relationships Illustration of CHAID Specifying a Model An Exploratory Look Database Implication Summary Reference Market Segment Classification Modeling with Logistic Regression Binary Logistic Regression Polychotomous Logistic Regression Model Model Building With PLR Market Segmentation Classification Model Summary CHAID as a Method for Filling in Missing Values Introduction to the Problem of Missing Data Missing-data Assumption CHAID Imputation Illustration CHAID Most-likely Category Imputation for a Categorical Variable Summary Reference Identifying Your Best Customers: Descriptive, Predictive and Look-alike Profiling Some Definitions Illustration of a Flawed Targeting Effort Well-Defined Targeting Effort Predictive Profiles Continuous Trees Look-alike Profiling Look-alike Tree Characteristics Summary Assessment of Database Marketing Models Accuracy for Response Model Accuracy for Profit Model Decile Analysis and Cum Lift for Response Model Decile Analysis and Cum Lift for Profit Model Precision for Response Model Construction of SWMAD Separability for Response and Profit Models Guidelines for Using Cum Lift, HL/SWMAD and CV Summary Bootstrapping in Database Marketing: A New Approach for Validating Models Traditional Model Validation Illustration Three Questions The Bootstrap How to Bootstrap Bootstrap Decile Analysis Validation Another Question Bootstrap Assessment of Model Implementation Performance Bootstrap Assessment of Model Efficiency Summary Reference Visualization of Database Models Brief History of the Graph Star Graph Basics Star Graphs for Single Variables Star Graphs for Many Variables Considered Jointly Profile Curves Method Illustration Summary SAS Code for Star Graphs for Each Demographic Variable about the Deciles SAS Code for Star Graphs for Each Decile About the Demographic Variables SAS Code for Profile Curves: All Deciles Reference Genetic Modeling in Database Marketing: The GenIQ Model What Is Optimization? What Is Genetic Modeling Genetic Modeling: An Illustration Parameters for Controlling a Genetic Model Run Genetic Modeling: Strengths and Limitations Goals of Modeling in Database Marketing The GenIQ Response Model The GenIQ Profit Model Case Study-Response Model Case Study-Profit Model Summary Reference Finding the Best Variables for Database Marketing Models Background Weakness in the Variable Selection Methods Goals of Modeling in Database Marketing Variable Selection with GenIQ Nonlinear Alternative to Logistic Regression Model Summary Reference Interpretation of Coefficient-free Models The Linear Regression Coefficient Illustration for the Simple Ordinary Regression Model The Quasi-Regression Coefficient for Simple Regression Models Partial Quasi-RC for the Everymodel Quasi-RC for a Coefficient-free Model Summary