Transforming unstructured voice and text data into insight for paramedic emergency service using recurrent and convolutional neural networks

Paramedics often have to make lifesaving decisions within a limited time in an ambulance. They sometimes ask the doctor for additional medical instructions, during which valuable time passes for the patient. This study aims to automatically fuse voice and text data to provide tailored situational awareness information to paramedics. To train and test speech recognition models, we built a bidirectional deep recurrent neural network (long short-term memory (LSTM)). Then we used convolutional neural networks on top of custom-trained word vectors for sentence-level classification tasks. Each sentence is automatically categorized into four classes, including patient status, medical history, treatment plan, and medication reminder. Subsequently, incident reports were automatically generated to extract keywords and assist paramedics and physicians in making decisions. The proposed system found that it could provide timely medication notifications based on unstructured voice and text data, which was not possible in paramedic emergencies at present. In addition, the automatic incident report generation provided by the proposed system improves the routine but error-prone tasks of paramedics and doctors, helping them focus on patient care.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[3]  David Griol,et al.  The Dawn of the Conversational Interface , 2016 .

[4]  Thomas M. Breuel,et al.  High Performance Text Recognition Using a Hybrid Convolutional-LSTM Implementation , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[5]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[6]  Edward Chow,et al.  Automatic speech recognition for launch control center communication using recurrent neural networks with data augmentation and custom language model , 2018, Defense + Security.

[7]  J Bound,et al.  Systems mapping workshops and their role in understanding medication errors in healthcare. , 2010, Applied ergonomics.

[8]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[9]  Jürgen Schmidhuber,et al.  Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[10]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[11]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[12]  Thomas Lu,et al.  Predicting Rapid Fire Growth (Flashover) Using Conditional Generative Adversarial Networks , 2018, IRIACV.

[13]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[14]  Adrian Stoica,et al.  Improved target recognition response using collaborative brain-computer interfaces , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[15]  Kevin Yu,et al.  Improved visible to IR image transformation using synthetic data augmentation with cycle-consistent adversarial networks , 2019, Defense + Commercial Sensing.

[16]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[17]  Jacob Perkins,et al.  Python 3 text processing with NLTK 3 cookbook : over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 , 2014 .

[18]  Bringing patients’ own medications into an emergency department by ambulance: effect on prescribing accuracy when these patients are admitted to hospital , 2009, The Medical journal of Australia.

[19]  Dong Yu,et al.  Recent progresses in deep learning based acoustic models , 2017, IEEE/CAA Journal of Automatica Sinica.

[20]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  M. A. Anusuya,et al.  Speech Recognition by Machine, A Review , 2010, ArXiv.

[22]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Stephen J. Cox,et al.  Improving lip-reading performance for robust audiovisual speech recognition using DNNs , 2015, AVSP.

[24]  Joon Son Chung,et al.  Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[26]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[27]  S. Shimojo,et al.  Interpersonal body and neural synchronization as a marker of implicit social interaction , 2012, Scientific Reports.

[28]  K. Yun On the Same Wavelength: Face-to-Face Communication Increases Interpersonal Neural Synchronization , 2013, The Journal of Neuroscience.

[29]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[30]  Giuliano Antoniol,et al.  Language model representations for beam-search decoding , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[31]  Tuan Nguyen,et al.  Small target detection for search and rescue operations using distributed deep learning and synthetic data generation , 2019, Defense + Commercial Sensing.

[32]  Edward Chow,et al.  Occluded object reconstruction for first responders with augmented reality glasses using conditional generative adversarial networks , 2018, Defense + Security.

[33]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[34]  Hanna Maurin Söderholm Emergency visualized : exploring visual technology for paramedic-physician collaboration in emergency care , 2013 .

[35]  Xiaoling Xia,et al.  Inception-v3 for flower classification , 2017, 2017 2nd International Conference on Image, Vision and Computing (ICIVC).

[36]  Luan Nguyen,et al.  Optimized training of deep neural network for image analysis using synthetic objects and augmented reality , 2019, Defense + Commercial Sensing.

[37]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[38]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[39]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).