A Property of the CHAID Partitioning Method for Dichotomous Randomized Response Data and Categorical Predictors

In this paper, we present empirical and theoretical results on classification trees for randomized response data. We considered a dichotomous sensitive response variable with the true status intentionally misclassified by the respondents using rules prescribed by a randomized response method. We assumed that classification trees are grown using the Pearson chi-square test as a splitting criterion, and that the randomized response data are analyzed using classification trees as if they were not perturbed. We proved that classification trees analyzing observed randomized response data and estimated true data have a one-to-one correspondence in terms of ranking the splitting variables. This is illustrated using two real data sets.

[1]  Sik-Yum Lee,et al.  Maximum Likelihood Analysis of a Two-Level Nonlinear Structural Equation Model With Fixed Covariates , 2005 .

[2]  Ulf Böckenholt,et al.  Item Randomized-Response Models for Measuring Noncompliance: Risk-Return Perceptions, Social Influences, and Self-Protective Responses , 2007 .

[3]  Paul E. Tracy,et al.  Randomized Response: A Method for Sensitive Surveys , 1986 .

[4]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[5]  Roberta Siciliano,et al.  Multivariate data analysis and modeling through classification and regression trees , 2000 .

[6]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[7]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[8]  Roberta Siciliano,et al.  A fast splitting procedure for classification trees , 1997, Stat. Comput..

[9]  N. Christou,et al.  Statistical Methods in e-Commerce Research , 2008 .

[10]  Peter G. M. van der Heijden,et al.  A validation of a computer‐assisted randomized response survey to estimate the prevalence of fraud in social security , 2006 .

[11]  P. Schmidt,et al.  Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[12]  J. Fox Randomized Item Response Theory Models , 2005 .

[13]  David Biggs,et al.  A method of choosing multiway partitions for classification and decision trees , 1991 .

[14]  Ulf Böckenholt,et al.  Applications of Randomized Response Methodology in e‐Commerce , 2008 .

[15]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  G. Maddala Limited-dependent and qualitative variables in econometrics: Introduction , 1983 .

[18]  A. Chaudhuri,et al.  Randomized Response: Theory and Techniques , 1987 .

[19]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[20]  Peter G. M. van der Heijden,et al.  Meta-Analysis of Randomized Response Research , 2005 .

[21]  C. Mitchell Dayton,et al.  Covariate Randomized Response Models , 1988 .