A statistical approach to detect interviewer falsification of survey data

Survey data are potentially affected by interviewer falsifications with data fabrication being the most blatant form. Even a small number of fabricated interviews might seriously impair the results of further empirical analysis. Besides reinterviews, some statistical approaches have been proposed for identifying this type of fraudulent behaviour. With the help of a small dataset, this paper demonstrates how cluster analysis, which is not commonly employed in this context, might be used to identify interviewers who falsify their work assignments. Several indicators are combined to classify ‘at risk’ interviewers based solely on the data collected. This multivariate classification seems superior to the application of a single indicator such as Benford’s law.

[1]  Paul Biemer,et al.  The optimal design of quality control samples to detect interviewer cheating , 1989 .

[2]  Pd Scott,et al.  CSM-349 - Benford's Law: An Empirical Investigation and a Novel Explanation , 2001 .

[3]  Gadi Pinkas,et al.  Unsupervised Profiling for Identifying Superimposed Fraud , 1999, PKDD.

[4]  Gert G. Wagner,et al.  Identification, Characteristics and Impact of Faked Interviews in Surveys: An Analysis by Means of Genuine Fakes in the Raw Data of SOEP , 2003, SSRN Electronic Journal.

[5]  Lynne Stokes EVALUATION OF THE INTERVIEWER QUALITY CONTROL PROCEDURE FOR THE POST-ENUMERATION SURVEY* , 2002 .

[6]  Keith A. Albright,et al.  USING DATE AND TIME STAMPS TO DETECT INTERVIEWER FALSIFICATION , 2002 .

[7]  J. Michael Brick,et al.  Using statistical models for sample design of a reinterview program , 2011 .

[8]  Laura Flicker,et al.  A System for Detecting Interviewer Falsification , 2004 .

[9]  Gösta Forsman,et al.  The Design and Analysis of Reinterview: An Overview , 2011 .

[10]  Naomi Braine,et al.  Technical Papers on Health and Behavior Measurement , 1994 .

[11]  Klaus-Robert Müller,et al.  Automatic Identification of Faked and Fraudulent Interviews in the German SOEP , 2005 .

[12]  Theodore P. Hill,et al.  The Difficulty of Faking Data , 1999 .

[13]  Sutapat Thiprungsri,et al.  Cluster analysis for anomaly detection in accounting , 2012 .

[14]  Rainer Schnell Der Einfluß gefälschter Interviews auf Survey-Ergebnisse , 1991 .

[15]  Wolfgang Härdle,et al.  Applied Multivariate Statistical Analysis: third edition , 2006 .

[16]  J. Bushery,et al.  GETTING MORE BANG FROM THE REINTERVIEW BUCK: IDENTIFYING "AT RISK" INTERVIEWERS , 2002 .

[17]  A. Saville,et al.  Using Benford’s Law to detect data error and fraud: An examination of companies listed on the Johannesburg Stock Exchange , 2014 .

[18]  T. Hill A Statistical Derivation of the Significant-Digit Law , 1995 .

[19]  Andreas Diekmann Diagnose von Fehlerquellen und methodische Qualität in der sozialwissenschaftlichen Forschung [Sources of Bias and Quality of Data in Social Science Research] , 2002 .

[20]  Moon Jung Cho,et al.  Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford's Law , 2003 .

[21]  U. S. Census,et al.  INTERVIEWER FALSIFICATION IN CENSUS BUREAU SURVEYS , 2002 .

[22]  Mark J. Nigrini,et al.  I've Got Your Number , 1999 .

[23]  Steven K. Donoho,et al.  Early detection of insider trading in option markets , 2004, KDD.

[24]  Miklos A. Vasarhelyi,et al.  Cluster Analysis for Anomaly Detection in Accounting Data: An Audit Approach 1 , 2011 .

[25]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .