Understanding Data Quality through Reliability: A Comparison of Data Reliability Assessment in Three International Relations Datasets

Although recent data creation efforts in international relations have begun to focus on issues of reliability and validity more explicitly than previously, current efforts still contain significant problems. This essay focuses on three recent data generation projects that study international relations (the ICOW, ATOP, and River Treaty datasets) and shows the successes and failures of each in assessing reliability when generating data from qualitative evidence. All three datasets attempt to generate reliable data, document the procedures used, and present indications of data reliability. However, their efforts face problems when assessing the reliability of their case selection variables, in the development of reliability indicators, and in the presentation of reliability statistics. In addition to evaluating these recent efforts to generate large-N databases, this essay clarifies the difference between generating data from qualitative and quantitative evidence, explains the importance of reliability when coding qualitative evidence, and provides ways to improve the assessment of the quality of one’s data.

[1]  John A. Vasquez The Steps to War: Toward a Scientific Explanation of Correlates of War Findings , 1987, World Politics.

[2]  M. Lombard,et al.  Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability , 2002 .

[3]  Eric R. A. N. Smith,et al.  The Measurement Characteristics of Congressional Roll-Call Indexes , 1990 .

[4]  Charles P. Smith Content analysis and narrative analysis. , 2000 .

[5]  Edward G. Carmines,et al.  Reliability and Validity Assessment , 1979 .

[6]  J. T. McDonough,et al.  Content Analysis. A Technique for Systematic Inference from Communications , 1974 .

[7]  Robert O. Keohane,et al.  Designing Social Inquiry: Scientific Inference in Qualitative Research. , 1995 .

[8]  Frank G. Baugh,et al.  Correcting Effect Sizes for Score Reliability: A Reminder that Measurement and Substantive Issues are Linked Inextricably , 2002 .

[9]  S. Parker Content Analysis for the Social Sciences and Humanities , 1970 .

[10]  R. Gastil Freedom in the World , 1982 .

[11]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[12]  H. Kyburg Theory and measurement , 1984 .

[13]  S Sawilowsky Shlomo Reliability as Psychometrics Versus Datametrics , 2003 .

[14]  Robert J-P. Hauck Oh Monsieur Pasteur, We Hardly Knew You! , 1995 .

[15]  Steven D. Brown,et al.  Handbook of applied multivariate statistics and mathematical modeling , 2000 .

[16]  Daniel J. Hruschka,et al.  Reliability in Coding Open-Ended Data: Lessons Learned from HIV Behavioral Research , 2004 .

[17]  D. Campbell,et al.  Convergent and discriminant validation by the multitrait-multimethod matrix. , 1959, Psychological bulletin.

[18]  Duncan Cramer,et al.  Measurement reliability and agreement , 1998 .

[19]  Oliver P. John,et al.  Measurement: Reliability, construct validation, and scale construction. , 2000 .

[20]  Grace Hui Yang,et al.  Next steps in near-duplicate detection for eRulemaking , 2006, DG.O.

[21]  P. Shrout Measurement reliability and agreement in psychiatry , 1998, Statistical methods in medical research.

[22]  L. Wrightsman,et al.  Tests and Measurements: Assessment and Prediction , 1960 .

[23]  Kimberly A. Neuendorf,et al.  The Content Analysis Guidebook , 2001 .

[24]  T. Bruce,et al.  Confidence Intervals About Score Reliability Coefficients , 2003 .

[25]  Gary Goertz,et al.  Social Science Concepts: A User's Guide , 2005 .

[26]  D. Collier,et al.  Measurement Validity: A Shared Standard for Qualitative and Quantitative Research , 2001, American Political Science Review.

[27]  Ted Robert Gurr,et al.  Transitions to Democracy: Tracking Democracy''s Third Wave with the Polity III Data , 1996 .

[28]  J. David Singer,et al.  Correlates of War , 2008, Encyclopedia of Violence, Peace, & Conflict.

[29]  Gary Goertz,et al.  The Methodology of Necessary Conditions , 2000 .

[30]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[31]  Robert L. Brennan,et al.  An Essay on the History and Future of Reliability from the Perspective of Replications , 2001 .

[32]  Namhee Kwon,et al.  Multidimensional text analysis for eRulemaking , 2006, DG.O.

[33]  T. F. Carney,et al.  Content Analysis: A Technique for Systematic Inference from Communications , 1972 .

[34]  Grace Hui Yang,et al.  Near-duplicate detection for eRulemaking , 2005, DG.O.

[35]  R. Mitchell INTERNATIONAL ENVIRONMENTAL AGREEMENTS: A Survey of Their Features, Formation, and Effects , 2003 .

[36]  Ken Conca,et al.  Global Regime Formation or Complex Institution Building? The Principled Content of International River Agreements , 2006 .

[37]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[38]  Gery W. Ryan,et al.  Measuring the typicality of text : Using multiple coders for more than just reliability and validity checks , 1999 .

[39]  Stephen Lacy,et al.  Sampling Error and Selecting Intercoder Reliability Samples for Nominal Content Categories , 1996 .

[40]  Robert E. Slavin,et al.  Research methods in education: A practical guide , 1984 .

[41]  Peter M. Chisnall,et al.  Questionnaire Design, Interviewing and Attitude Measurement , 1993 .

[42]  Leroy Wolins,et al.  Multiple Indicators: An Introduction , 1979 .

[43]  Elazar J. Pedhazur,et al.  Measurement, Design, and Analysis: An Integrated Approach , 1994 .