Determining sample sizes for assessing inter-observer reliability for direct observational studies