Estimating multiple rater agreement for a rare diagnosis