Interactive Visualization of AI-based Speech Recognition Texts

Speech recognition technology has achieved impressive success recently with AI techniques of deep learning networks. Speechto-text tools are becoming prevalent in many social applications such as field surveys. However, the speech transcription results are far from perfection for direct use in these applications by domain scientists and practitioners, which prevents the users from fully leveraging the AI tools. In this paper, we show interactive visualization can play important roles in post-AI understanding, editing, and analysis of speech recognition results by presenting specified task characterization and case examples.

[1]  Minsuk Kahng,et al.  Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers , 2018, IEEE Transactions on Visualization and Computer Graphics.

[2]  Muhammad Ghulam Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system , 2015, Cluster Computing.

[3]  Zhen Li,et al.  Towards Better Analysis of Deep Convolutional Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[4]  Finn Årup,et al.  A new ANEW: Evaluation of a word list for sentiment analysis in microblogs , 2016 .

[5]  M. Shamim Hossain,et al.  Cloud-Assisted Speech and Face Recognition Framework for Health Monitoring , 2015, Mobile Networks and Applications.

[6]  Alexander M. Rush,et al.  LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[7]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[8]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.

[9]  Paulo E. Rauber,et al.  Visualizing the Hidden Activity of Artificial Neural Networks , 2017, IEEE Transactions on Visualization and Computer Graphics.

[10]  Jiaqi Zheng,et al.  MAN: Mutual Attention Neural Networks Model for Aspect-Level Sentiment Classification in SIoT , 2020, IEEE Internet of Things Journal.

[11]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[12]  Florian Schiel,et al.  Multilingual processing of speech via web services , 2017, Comput. Speech Lang..

[13]  Andrés Montoyo,et al.  Advances on natural language processing , 2007, Data Knowl. Eng..

[14]  Andreas Kerren,et al.  Text visualization techniques: Taxonomy, visual survey, and community insights , 2015, 2015 IEEE Pacific Visualization Symposium (PacificVis).

[15]  Tao Wang,et al.  TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing , 2014, ArXiv.

[16]  Kwan-Liu Ma,et al.  Semantic‐Preserving Word Clouds by Seam Carving , 2011, Comput. Graph. Forum.

[17]  Mark J. F. Gales,et al.  Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[18]  Kevin C. Moffitt,et al.  Which Spoken Language Markers Identify Deception in High-Stakes Settings? Evidence From Earnings Conference Calls , 2016 .

[19]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Lukas Kencl,et al.  Cloud-Based Assistive Speech-Transcription Services , 2012, ICCHP.

[21]  Yang Wang,et al.  Privacy Preserving Visualization: A Study on Event Sequence Data , 2018, Comput. Graph. Forum.

[22]  Daniel A. Keim,et al.  Bridging Text Visualization and Mining: A Task-Driven Survey , 2019, IEEE Transactions on Visualization and Computer Graphics.

[23]  Chris Fox,et al.  The Handbook of Computational Linguistics and Natural Language Processing , 2010 .

[24]  Jianwei Niu,et al.  SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis , 2020, IEEE Transactions on Knowledge and Data Engineering.