Strategies for the Prevention, Detection, and Correction of Measurement Error in Data Collected from Textual Sources

This article considers the problem of measurement error in data collected from textual sources (for example, newspapers, police records, medical records). The article discusses several levels in the strategy of both detection and correction of errors, on-line and off-line, during and after data collection. It introduces social scientists to useful concepts and tools such as performance-shaping factors and acceptance sampling, which have been developed in the field of engineering. It proposes the use of semantic text grammars in lieu of traditional content analysis schemes for the collection of textual data. Linguistics-based coding schemes yield much richer data. The article shows that these data are also more reliable, because they retain the semantic coherence of the original text.