A study of the test-retest reliability of ten olfactory tests.

Ten tests of olfactory function (including tests of odor identification, detection, discrimination, memory, and suprathreshold odor intensity and pleasantness perception) were administered on two test occasions to 57 subjects ranging in age from 18 to 83 years. The stability of the average test scores was determined across the two test sessions for 14 measures derived from these 10 tests and for subcomponents of the Japanese T&T olfactometer threshold test. In addition, the test-retest reliability (Pearson r) of each test measure was established. With the exception of a response bias measure, the average test scores did not differ significantly across the two test sessions. Statistically, the reliability coefficients of the primary test measures fell into three general classes bound by the following r values: 0.43-0.53; 0.67-0.71; 0.76-0.90. Detection threshold values were more reliable than recognition threshold values; those based upon a single ascending presentation series were much less reliable than those based upon a staircase procedure. The relationship between test length and reliability was examined for several of the tests and mathematically modeled. For example, within the staircase series incorporating the odorant phenyl ethyl alcohol, reliability was related (R2 = 0.984) to the number of reversals included in the threshold estimate by a function derived from the Spearman-Brown formula; namely, reliability = 0.455* # reversals/[1 + 0.455 (# reversals - 1)]. Reversal location, per se, had little influence on reliability. Overall, this study suggests that (i) considerable variation is present in the reliability of olfactory tests, (ii) reliability is a function of test length, and (iii) caution is warranted in comparing results from nominally different olfactory tests in applied settings since the findings may, in some instances, simply reflect the differential reliability of the tests.