The reliability and functional validity of visual and semiautomatic sleep/wake scoring in the Møll-Wistar rat.

The present paper has three major objectives: first, to document the reliability of a published criteria set for sleep/wake scoring in the rat; second, to develop a computer algorithm implementation of the criteria set; and third, to document the reliability and functional validity of the computer algorithm for sleep/wake scoring. The reliability of the visual criteria was assessed by letting two raters separately score 8 hours of polygraph records from the light period from five rats (14,040 10-second scoring epochs). Scored stages were waking, slow-wave sleep-1, slow-wave sleep-2, transition type sleep and rapid eye movement (REM) sleep. The visual criteria had good interrater reliability [Cohen's kappa (kappa) = 0.68], with 92.6% agreement on the waking/nonrapid eye movement (NREM) sleep/REM sleep distinction (kappa = 0.89). This indicated that the criteria allow separate raters to independently classify sleep/wake stages with very good agreement. An independent group of 10 rats was used for development of an algorithm for semiautomatic computer scoring. A close implementation of the visual criteria was chosen. The algorithm was based on power spectral densities from two electroencephalogram (EEG) leads and on electromyogram (EMG) activity. Five 2-second fast Fourier transform (FFT) epochs from each EEG/EMG lead per 10-second sleep/wake scoring epoch were used to take the spatial and temporal context into account. The same group of five rats used in visual scoring was used to appraise reliability of computerized scoring. The computer score was compared with the visual score for each rater. There was a lower agreement (kappa = 0.57 and 0.62 for the two raters) than in interrater visual scoring [percent agreement 87.7 and 89.1% (kappa = 0.82 and 0.84) in the waking/NREM sleep/REM sleep distinction]. Subsequently, the computer scores of the raters were compared. The interrater reliability was better than the interrater reliability for visual scoring (kappa = 0.75), with 92.4% agreement for the waking/NREM sleep/REM sleep distinction (kappa = 0.89). The computer scoring algorithm was applied to data from a third independent group of rats (n = 6) from an acoustical stimulus arousal threshold experiment, to assess the functional validity of the scoring directly with respect to arousal threshold. The computer algorithm scoring performed as well as the original visual sleep/wake stage scoring. This indicated that the lower intrarater reliability did not have a significant negative influence on the functional validity of the sleep/wake score.