A Prototype Public Speaking Skills Assessment: An Evaluation of Human‐Scoring Quality

The purpose of this paper is to summarize the evaluation of human-scoring quality for an assessment of public speaking skills. Videotaped performances given by 17 speakers on 4 tasks were scored by expert and nonexpert raters who had extensive experience scoring performance-based and constructed-response assessments. The Public Speaking Competence Rubric was used to score the speeches. Across all of the dimensions of presentation competence, interrater reliability between expert and nonexpert raters ranged between .23 and .71. The dimensions of public speaking competence associated with the lowest interrater reliability were effectual persuasion and word choice (.41 and .23, respectively). Even expert raters, individuals with a background in teaching and evaluating oral communication, had difficulty agreeing with one another on those dimensions. Low-inference dimensions such as visual aids and vocal expression were associated with much higher levels of interrater reliability, .65 and .75, respectively. The holistic score was associated with an interrater reliability of .63. These results point to the need for a significant investment in task, rubric, and training development for the public speaking competence assessment before it can be used for large-scale assessment purposes.