Reliability of severity estimates for usability problems found by heuristic evaluation

Ratings from single evaluators are very unreliable when usability specialists judge the severity of usability problems found by heuristic evaluation, but the mean severity rating from four evaluators gets within half a rating point of the true severity 95% of the time. Also, the evaluators do agree that usability problems found by heuristic evaluation are all real problems even though each rater had originally only identified a small proportion of the problems.