An empirical study comparing pilots' interrater reliability ratings for workload and effectiveness

Pilot workload and technical effectiveness have been considered to be essential criteria when evaluating aircraft operability with subjective rating techniques. However, validation studies of the mission operability assessment technique found considerably higher levels of interrater reliability for pilots' ratings of workload than for technical effectiveness. The finding was replicated across aircraft, pilots, tasks, and with different forms of the rating scales. These results suggest that the implicit assumption that interrater reliability will be high and essentially identical for both pilot workload and technical effectiveness ratings may be invalid. This finding has implications for how one defines and subsequently measures aircraft operability with subjective rating techniques. >