Observer variation in the clinical and laboratory evaluation of patients with thyroid dysfunction and goiter.

Three endocrinologists assessed thyroid function (hypothyroid, possibly hypothyroid, euthyroid, possibly hyperthyroid, or hyperthyroid), thyroid size (small, medium, or large), thyroid type (diffuse, nodular, or solitary nodule), and diagnosis and treatment options in 55 patients (47 women and 8 men) with a median age of 43 years (range 19 to 74) suspected of thyroid disease. The observers were presented stepwise for the (1) patient, clinical examination, and patient history; (2) blood tests; (3) 99mTc-pertechnetate scintigraphy; and (4) ultrasonography. The reproducibility was assessed by means of the K coefficient. Compared with evaluation of the patient alone, agreement on thyroid dysfunction was almost perfect when the results of the blood tests were known. The K values for pairs of observers rose significantly from 0.55 to 0.65 to 0.88 to 0.93. All three observers altered their opinion as to thyroid dysfunction in one third of the patients when the blood tests were known. Compared with evaluation of the patient alone, agreement on the morphology of the thyroid gland did not improve significantly in spite of access to thyroid scintigraphy; with the addition of thyroid ultrasound, agreement improved significantly for some pairs of observers. The three observers agreed on the rough estimate of thyroid size in only 36% of the patients. When all information was available, the three observers agreed on diagnosis and treatment category in 60% of the patients. Doctors should bear in mind the considerable observer variation when they evaluate patients with suspected thyroid disease.