An experimental model of the L4–L5 lumbar motion segment was developed that allowed precise manipulation of sagittal translation, rotation of L5 relative to L4, tilt of L4 on L5, and control of roentgenogram quality (image clarity) by placing a water bath between the tube and the vertebral body. A series of experiments were designed to systematically assess the consistency and accuracy of sagittal translation measurements from roentgenograms of varying quality, using different measurement protocols and various rater combinations on models with varying degrees of concomitant motions (rotations and tilts). Study 1 assessed the effects of roentgenogram quality, raters, and seven measurement methods on the consistency and accuracy of evaluating translations in the sagittal plane. Results indicated very high reliabilities across roentgenogram quality, raters, and measurement. As expected, high-quality roentgenograms were more accurately evaluated than lower-quality roentgenograms. However, closer inspection of the consequences of errors in measured translations indicated surprisingly high false-positive and false-negative rates, with significant differences observed between measurement methods. Study 2 assessed the effects of concomitant motions and measurement methods on the consistency and accuracy of evaluations. Within-rater consistency and accuracy indices were remarkably high and similar across measurement methods and degrees of concomitant motions. However, important differences in the false-positive and false-negative rates were again observed. Method 2, described by Morgan and King, demonstrated the overall best performance and the least interference due to concomitant motions. Study 3 assessed the effects of raters and measurement methods on the consistency of measuring translation in clinical roentgenograms, where concomitant motion factors may be present, but not explicitly considered. Results indicated substantially lower within-and between-rater consistency estimates relative to consistencies obtained from the model, although these magnitudes were similar to those reported by others evaluating clinical roentgenograms. The implications of lower consistency estimates relative to increased false-positive and false-negative rates must be more closely examined. These studies present evidence suggesting that high consistency and accuracy indices do not ensure acceptable false-positive and false-negative rates and, thus, provide empirical evidence supporting the view that using roentgenograms as a basis for diagnosing instability often can lead to errors in classification. This is less so when observed translations are relatively large ( ± 5+ mm) on roentgenograms that are relatively clear, with little obliquity, and when concomitant motions are minimal. However, when roentgenogram quality is lower, obliquity problems are apparent, and concomitant motions are involved, even relatively large measured translations of ±6 mm or more may occur when actual translations are substantially less. In these settings, using roentgenograms to classify patients as having excessive translation may result in large false-positive rates.