Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire

Abstract Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interviews are complete falsifications, as shown by previous research. This analysis is extended to the case when only a share of questions within all interviews provided by an interviewer is fabricated. The assessment is based on synthetic datasets with a priori set properties. These are constructed from a unique experimental dataset containing both real and fabricated data for each respondent. Such a bootstrap approach makes it possible to evaluate the robustness of the method when the share of fabricated answers per interview decreases. The results indicate a substantial loss of discriminatory power in the standard cluster analysis if the share of fabricated answers within an interview becomes small. Using a novel cluster method which allows imposing constraints on cluster sizes, performance can be improved, in particular when only few falsifiers are present. This new approach will help to increase the robustness of survey data by detecting potential falsifiers more reliably.

[1]  Gösta Forsman,et al.  The Design and Analysis of Reinterview: An Overview , 2011 .

[2]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[3]  J. Bushery,et al.  GETTING MORE BANG FROM THE REINTERVIEW BUCK: IDENTIFYING "AT RISK" INTERVIEWERS , 2002 .

[4]  Peter Winker,et al.  A statistical approach to detect interviewer falsification of survey data , 2012 .

[5]  Francesco Battaglia,et al.  Evolutionary Statistical Procedures , 2011 .

[6]  Ingo Althöfer,et al.  On the convergence of “Threshold Accepting” , 1991 .

[7]  Peter Winker Optimization Heuristics in Econometrics : Applications of Threshold Accepting , 2000 .

[8]  Ankur Teredesai,et al.  Evaluation of Classification Methods , 2014, Data Classification: Algorithms and Applications.

[9]  S. Messick THE PSYCHOLOGY OF ACQUIESCENCE: AN INTERPRETATION OF RESEARCH EVIDENCE1 , 1966 .

[10]  J. Krosnick,et al.  AN EVALUATION OF A COGNITIVE THEORY OF RESPONSE-ORDER EFFECTS IN SURVEY MEASUREMENT , 1987 .

[11]  Christoph J. Kemper,et al.  Nuisance or Remedy? The Utility of Stylistic Responding as an Indicator of Data Fabrication in Surveys , 2014 .

[12]  Karl-Heinz Reuband Interviews, die keine sind—"Erfolge" und "Mi–Serfolge" beim Fälschen von Interviews. , 1990 .

[13]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[14]  Natalja Menold,et al.  Nuisance or remedy? The utility of stylistic responding for the identification of data fabrication in surveys , 2014 .

[15]  P. Winker,et al.  Identification of partial falsifications in survey data , 2014 .

[16]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[17]  Christoph J. Kemper,et al.  How Do Real and Falsified Data Differ? Psychology of Survey Response as a Source of Falsification Indicators in Face-to-Face Surveys , 2014 .

[18]  Nina Storfinger,et al.  Datenbasierte Indikatoren für potenziell abweichendes Interviewerverhalten , 2011 .

[19]  Natalja Menold,et al.  Development of a method for ex-post identification of falsifications in survey data , 2011 .

[20]  N. Menold,et al.  A literature review of methods to detect fabricated survey data , 2011 .

[21]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[22]  L. Crespi,et al.  THE CHEATER PROBLEM IN POLLING , 1945 .

[23]  A. Finn,et al.  Genuine Fakes: The prevalence and implications of fieldworker fraud in a large South African survey , 2013 .

[24]  Enrico Schumann,et al.  Numerical Methods and Optimization in Finance , 2011 .

[25]  Klaus-Robert Müller,et al.  Automatie Identification of Faked and Fraudulent Interviews in the German SOEP , 2005, Journal of Contextual Economics – Schmollers Jahrbuch.