Modeling Closed Captioning Subjective Quality Assessment by Deaf and Hard of Hearing Viewers

Closed Captioning (CC) is a service primarily designed for deaf and hard of hearing (D/HoH) viewers. The CC translates spoken speech into text for television or film screen display. The quality assessment methods for live captioning are limited to quantitative measures, while the viewers are still dissatisfied with the current quality. One method to improve the current quality assessment procedure is to include D/HoH viewers in the evaluation procedure for their subjective assessment input. However, it could be costly and impractical to perform evaluations for the entire broadcasted shows. Therefore, it would be helpful to model subjective assessments that could replicate and predict human decisions. In this article, we report on a model of probabilities of D/HoH viewer assessment decisions for CC quality factors based on actual user preferences. An online survey was designed and conducted to collect assessment data for 22 error variation samples from four quality factors: delay, speed, missing words, and paraphrasing of captions. The results are analyzed using the signal detection theory framework to create decision probability models for D/HoH viewers.

[1]  Marc Brysbaert,et al.  Subtlex-UK: A New and Improved Word Frequency Database for British English , 2014, Quarterly journal of experimental psychology.

[2]  Neil A. Macmillan,et al.  Detection theory: A user's guide, 2nd ed. , 2005 .

[3]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[4]  Venkat Venkatasubramanian,et al.  A One-Third Advice Rule Based on a Control-Theoretic Opinion Dynamics Model , 2019, IEEE Transactions on Computational Social Systems.

[5]  J. Sandford The impact of subtitle display rate on enjoyment under normal television viewing conditions , 2015 .

[6]  John A. Nevin,et al.  SIGNAL DETECTION THEORY AND OPERANT BEHAVIOR: A Review of David M. Green and John A. Swets' Signal Detection Theory and Psychophysics.1 , 1969 .

[7]  M. Kronbichler,et al.  Words in Context: The Effects of Length, Frequency, and Predictability on Brain Responses During Natural Reading , 2016, Cerebral cortex.

[8]  William A. Wallace,et al.  Modeling Human Behavior on Social Media in Response to Significant Events , 2018, IEEE Transactions on Computational Social Systems.

[9]  Pablo Romero Fresco More haste less speed: edited versus verbatim respoken subtitles , 2009 .

[10]  Carl Jensema Viewer Reaction to Different Television Captioning Speeds , 1998, American annals of the deaf.

[11]  Lewis O. Harvey Detection Sensitivity and Response Bias , 2001 .

[12]  John R. Bormuth,et al.  CLOZE TEST READABILITY: CRITERION REFERENCE SCORES , 1968 .

[13]  John C. Hancock,et al.  Signal Detection Theory , 1966 .

[14]  Peter Gregor,et al.  “User sensitive inclusive design”— in search of a new paradigm , 2000, CUU '00.

[15]  Pablo Romero-Fresco,et al.  Accuracy Rate in Live Subtitling: The NER Model , 2015 .

[16]  David Ben-Arieh,et al.  Modeling Behavioral Response to Vaccination Using Public Goods Game , 2019, IEEE Transactions on Computational Social Systems.

[17]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[18]  Pablo Romero-Fresco,et al.  Subtitling Through Speech Recognition: Respeaking , 2014 .

[19]  Alexander Raake,et al.  Quality and Quality of Experience , 2014, Quality of Experience.

[20]  Leo Postman,et al.  THE PERCEPTION OF ERROR , 1951 .

[21]  Gregory John Downey,et al.  Closed Captioning: Subtitling, Stenography, and the Digital Convergence of Text with Television , 2008 .

[22]  Robin D. Thomas,et al.  Multidimensional signal detection decision models of the uncertainty task: Application to face perception , 2015 .

[23]  N. Macmillan,et al.  Response bias : characteristics of detection theory, threshold theory, and nonparametric indexes , 1990 .

[24]  Synho Do,et al.  How much data is needed to train a medical image deep learning system to achieve necessary high accuracy , 2015, 1511.06348.

[25]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[26]  R. Hayward,et al.  What is an error? , 2000, Effective clinical practice : ECP.

[27]  Thomas G. Dietterich,et al.  Incorporating Expert Feedback into Active Anomaly Discovery , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[28]  Pablo Romero-Fresco,et al.  Final thoughts: Viewing speed in subtitling , 2016 .

[29]  Vincent Louthan Some Systematic Grammatical Deletions and Their Effects on Reading Comprehension , 1965, English Journal.

[30]  Gabriel Altmann,et al.  Word Length and Word Frequency , 2007 .

[31]  Daniel B Wright,et al.  Functions for traditional and multilevel approaches to signal detection theory , 2009, Behavior research methods.

[32]  Vyacheslav P. Tuzlukov,et al.  Signal detection theory , 2001 .

[33]  Pablo Romero-Fresco Accessing communication: The quality of live subtitles in the UK , 2016 .

[34]  James Ohene-Djan,et al.  Emotional Subtitles: A System and Potential Applications for Deaf and Hearing Impaired People , 2007, CVHI.

[35]  Deborah I. Fels,et al.  Dancing with Words: Using Animated Text for Captioning , 2008, Int. J. Hum. Comput. Interact..

[36]  Zuzanna Klyszejko,et al.  Verbatim, Standard, or Edited?: Reading Patterns of Different Captioning Styles Among Deaf, Hard of Hearing, and Hearing Viewers , 2011, American annals of the deaf.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Carl Ralph Scott Jensema,et al.  Closed-Captioned Television Presentation Speed and Vocabulary , 1996, American annals of the deaf.

[39]  M. Brysbaert,et al.  Adding part-of-speech information to the SUBTLEX-US word frequencies , 2012, Behavior Research Methods.

[40]  Graham Michael Pullin,et al.  What is an inclusive design process , 2003 .

[41]  Mark R. Lehto,et al.  A Distributed Signal Detection Theory Model: Implications to the Design of Warnings , 1991, 1991 American Control Conference.

[42]  Marion Hersh,et al.  Deaf people?s experiences, attitudes and requirements of contextual subtitles: A two-country survey , 2013 .

[43]  Franz Pöchhacker,et al.  Quality assessment in interlingual live subtitling: The NTR Model , 2018, Linguistica Antverpiensia, New Series – Themes in Translation Studies.

[44]  J. Swets,et al.  A decision-making theory of visual detection. , 1954, Psychological review.

[45]  H Stanislaw,et al.  Calculation of signal detection theory measures , 1999, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[46]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[47]  J. Mullennix,et al.  Word familiarity and frequency in visual and auditory word recognition. , 1990, Journal of experimental psychology. Learning, memory, and cognition.

[48]  Randall J. Ryder,et al.  The Effect on Text Comprehension of Word Frequency , 1985 .

[49]  R. E. Pastore,et al.  SIGNAL DETECTION THEORY: CONSIDERATIONS FOR GENERAL APPLICATION , 1974 .