Accuracy of Cloud-Based Speech Recognition Open Application Programming Interface for Medical Terms of Korean

Background There are limited data on the accuracy of cloud-based speech recognition (SR) open application programming interfaces (APIs) for medical terminology. This study aimed to evaluate the medical term recognition accuracy of current available cloud-based SR open APIs in Korean. Methods We analyzed the SR accuracy of currently available cloud-based SR open APIs using real doctor–patient conversation recordings collected from an outpatient clinic at a large tertiary medical center in Korea. For each original and SR transcription, we analyzed the accuracy rate of each cloud-based SR open API (i.e., the number of medical terms in the SR transcription per number of medical terms in the original transcription). Results A total of 112 doctor–patient conversation recordings were converted with three cloud-based SR open APIs (Naver Clova SR from Naver Corporation; Google Speech-to-Text from Alphabet Inc.; and Amazon Transcribe from Amazon), and each transcription was compared. Naver Clova SR (75.1%) showed the highest accuracy with the recognition of medical terms compared to the other open APIs (Google Speech-to-Text, 50.9%, P < 0.001; Amazon Transcribe, 57.9%, P < 0.001), and Amazon Transcribe demonstrated higher recognition accuracy compared to Google Speech-to-Text (P < 0.001). In the sub-analysis, Naver Clova SR showed the highest accuracy in all areas according to word classes, but the accuracy of words longer than five characters showed no statistical differences (Naver Clova SR, 52.6%; Google Speech-to-Text, 56.3%; Amazon Transcribe, 36.6%). Conclusion Among three current cloud-based SR open APIs, Naver Clova SR which manufactured by Korean company showed highest accuracy of medical terms in Korean, compared to Google Speech-to-Text and Amazon Transcribe. Although limitations are existing in the recognition of medical terminology, there is a lot of rooms for improvement of this promising technology by combining strengths of each SR engines.

[1]  C. Mohanty,et al.  The prospective of Artificial Intelligence in COVID-19 Pandemic , 2021, Health and Technology.

[2]  Dongkyun Kim,et al.  Automatic Classification of the Korean Triage Acuity Scale in Simulated Emergency Rooms Using Speech Recognition and Natural Language Processing: a Proof of Concept Study , 2021, Journal of Korean medical science.

[3]  V. Nambudiri,et al.  Digital scribe utility and barriers to implementation in clinical practice: a scoping review , 2021, Health and Technology.

[4]  A. Mazumder,et al.  Does “AI” stand for augmenting inequality in the era of covid-19 healthcare? , 2021, BMJ.

[5]  N. Chavannes,et al.  The Computer Will See You Now: Overcoming Barriers to Adoption of Computer-Assisted History Taking (CAHT) in Primary Care , 2021, Journal of medical Internet research.

[6]  Eunggyun Kim,et al.  Korean Erroneous Sentence Classification With Integrated Eojeol Embedding , 2021, IEEE Access.

[7]  Gwang Yong Gim,et al.  The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API , 2021, Int. J. Networked Distributed Comput..

[8]  Dohyung Kim,et al.  Building a Korean conversational speech database in the emergency medical domain , 2020 .

[9]  Curt P. Langlotz,et al.  Geographic Distribution of US Cohorts Used to Train Deep Learning Algorithms. , 2020, JAMA.

[10]  Suzanne V. Blackley,et al.  Physician use of speech recognition versus typing in clinical documentation: A controlled observational study , 2020, Int. J. Medical Informatics.

[11]  S. Aminololama-Shakeri,et al.  The Doctor-Patient Relationship With Artificial Intelligence. , 2019, AJR. American journal of roentgenology.

[12]  David W. Bates,et al.  Analysis of Errors in Dictated Clinical Documents Assisted by Speech Recognition Software and Professional Transcriptionists , 2018, JAMA network open.

[13]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Seung Joo Choi,et al.  Comparison Analysis of Speech Recognition Open APIs’ Accuracy , 2017 .

[15]  Christine A. Sinsky,et al.  Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties , 2016, Annals of Internal Medicine.

[16]  M. Roland,et al.  Investigating the relationship between consultation length and patient experience: a cross-sectional study in primary care , 2016, The British journal of general practice : the journal of the Royal College of General Practitioners.

[17]  Enrico W. Coiera,et al.  Risks and benefits of speech recognition for clinical documentation: a systematic review , 2016, J. Am. Medical Informatics Assoc..

[18]  Linda Dawson,et al.  A systematic review of speech recognition technology in health care , 2014, BMC Medical Informatics and Decision Making.

[19]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[20]  Anabel M Scaranelo,et al.  Error rates in breast imaging reports: comparison of automatic speech recognition and dictation transcription. , 2011, AJR. American journal of roentgenology.

[21]  Tomi Kauppinen,et al.  Improvement of Report Workflow and Productivity Using Speech Recognition—A Follow-up Study , 2008, Journal of Digital Imaging.