Quality assurance controls in research data base management: nonsense codes in hierarchical file structures

In complex studies using multiple data bases composed of hierarchical file structures, there is a high probability that errors may be perpetuated into summary reports unless some form of quality assurance is integrated into the research data base management program. In studies that substitute numeric codes for variable values, this problem of error propagation is even more acute. This paper addresses the problem of error propagation in those studies employing a coding scheme to represent longer alphanumeric values. Several approaches are available that minimize errors in coding variables. Numeric codes with embedded information allocated to positions within the value codes are widely used. Such smart codes require a full knowledge of the universe the variables describe as well as the potential classification schemes for each variable. Nonsense codes, or codes without embedded information, efficiently circumvent the problems associated with smart codes. Alphanumeric variable values are assigned a sequential numeric code as new values are encountered in the data base. With nonsense codes, the management approach is open-ended and does not require a knowledge of the number of potential classification levels for the variables. Experience indicates that coding errors appear to be less frequent with nonsense codes. The use of themore » FORMAT procedure in SAS/version 79.2 complements the nonsense code approach using variable labeling. Current restrictions in the use of the FORMAT statement and the sort order of the labels in BY statements or PRINT requests can be circumvented by using the PUT function to assign format values to a new character variable. 1 figure.« less