Text Alignment for Real-Time Crowd Captioning

The primary way of providing real-time captioning for deaf and hard of hearing people is to employ expensive professional stenographers who can type as fast as natural speaking rates. Recent work has shown that a feasible alternative is to combine the partial captions of ordinary typists, each of whom types part of what they hear. In this paper, we describe an improved method for combining partial captions into a final output based on weighted A search and multiple sequence alignment (MSA). In contrast to prior work, our method allows the tradeoff between accuracy and speed to be tuned, and provides formal error bounds. Our method outperforms the current state-of-the-art on Word Error Rate (WER) (29.6%), BLEU Score (41.4%), and F-measure (36.9%). The end goal is for these captions to be used by people, and so we also compare how these metrics correlate with the judgments of 50 study participants, which may assist others looking to make further progress on this problem.

[1]  Josef Psutka,et al.  Captioning of Live TV Programs through Speech Recognition and Re-speaking , 2012, TSD.

[2]  John Canny,et al.  Strings algorithms and machine learning applications for computational biology , 1997 .

[3]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[4]  I. Dan Melamed,et al.  Precision and Recall of Machine Translation , 2003, NAACL.

[5]  Ye-Yi Wang,et al.  Is word error rate a good indicator for spoken language understanding accuracy , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[6]  Eunice Lund-Lucas,et al.  22. Transcribe Your Class: Using Speech Recognition to Improve Access for At-Risk Students , 2012 .

[7]  Knut Reinert,et al.  The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment , 2000, J. Comput. Biol..

[8]  Walter S. Lasecki,et al.  Real-time captioning by groups of non-experts , 2012, UIST.

[9]  Michael Riley,et al.  Towards automatic closed captioning : low latency real time broadcast news transcription , 2002, INTERSPEECH.

[10]  Sara H. Basson,et al.  Accessibility, transcription, and access everywhere , 2005, IBM Syst. J..

[11]  Walter S. Lasecki,et al.  Online quality control for real-time crowd captioning , 2012, ASSETS '12.

[12]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[13]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[14]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[15]  Walter S. Lasecki,et al.  Warping time for more effective real-time crowdsourcing , 2013, CHI.

[16]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[17]  Mike Wald Captioning for Deaf and Hard of Hearing People by Editing Automatic Speech Recognition in Real Time , 2006, ICCHP.

[18]  Mike Wald Creating accessible educational multimedia through editing automatic speech recognition captioning in real time , 2006, Interact. Technol. Smart Educ..

[19]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[20]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[21]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[22]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[23]  Ira Pohl,et al.  Heuristic Search Viewed as Path Finding in a Graph , 1970, Artif. Intell..

[24]  Li Deng,et al.  Why word error rate is not a good metric for speech recognizer training for the speech translation task? , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.