Interrater reliability.

Interrater Reliability For any research program that requires qualitative rating by different researchers, it is important to establish a good level of interrater reliability, also known as interobserver reliability. This ensures that the generated results meet the accepted criteria defining reliability [1], by quantitatively defining the degree of agreement between two or more observers. Interrater reliability is the most easily understood form of reliability [2], because everybody has encountered it. For example, watching any sport using judges, such as Olympics ice skating or a dog show, relies upon human observers maintaining a great degree of consistency between observers. If even one of the judges is erratic in their scoring system, this can jeopardize the entire system and deny a participant their rightful prize. Outside the world of sport and hobbies, inter-rater reliability has some far more important connotations and can directly influence your life. Examiners marking school and university exams are assessed on a regular basis, to ensure that they all adhere to the same standards. This is the most important example of interobserver reliability-it would be extremely unfair to fail an exam because the observer was having a bad day. For most examination boards, appeals are usually rare, showing that the interrater reliability [3] process is fairly robust. I used to work for a bird protection charity and, every morning, we went down to the seashore and used to estimate the number individuals for each bird species. Obviously, you cannot count thousands of birds individually; apart from the huge numbers, they constantly move, leaving and rejoining the group. Using experience, we estimated the numbers and then compared our estimate. If one person estimated 1000 dunlin, one 4000 and the other 12000, then there was something wrong with our estimation and it was highly unreliable.