Assessing the accuracy of automatic speech recognition for psychotherapy

Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy in mental health settings has not been well described. It is unclear which metrics and thresholds are appropriate for different clinical use cases, which may range from population descriptions to individual safety monitoring. Here we show that automatic speech recognition is feasible in psychotherapy, but further improvements in accuracy are needed before widespread use. Our HIPAA-compliant automatic speech recognition system demonstrated a transcription word error rate of 25%. For depression-related utterances, sensitivity was 80% and positive predictive value was 83%. For clinician-identified harm-related sentences, the word error rate was 34%. These results suggest that automatic speech recognition may support understanding of language patterns and subgroup variation in existing treatments but may not be ready for individual-level safety surveillance.

[1]  Matt Huenerfauth,et al.  Predicting the Understandability of Imperfect English Captions for People Who Are Deaf or Hard of Hearing , 2019, ACM Trans. Access. Comput..

[2]  The Lancet Digital Health Walking the tightrope of artificial intelligence guidelines in clinical practice. , 2019, The Lancet. Digital health.

[3]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[4]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[5]  M. Goldfried Obtaining consensus in psychotherapy: What holds us back? , 2019, The American psychologist.

[6]  Nicole Martinez-Martin,et al.  Ethical Issues for Direct-to-Consumer Digital Psychotherapy Apps: Addressing Accountability, Data Protection, and Consent , 2018, JMIR mental health.

[7]  J. Marc Overhage,et al.  A systematic comparison of contemporary automatic speech recognition engines for conversational clinical speech , 2018, AMIA.

[8]  Kaylee A. Bodner,et al.  Artificial Intelligence and the Future of Psychiatry: Insights from a Global Physician Survey , 2019, Artif. Intell. Medicine.

[9]  Kavishwar B. Wagholikar,et al.  Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach , 2017, BMC Medical Informatics and Decision Making.

[10]  Li Fei-Fei,et al.  Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions , 2018, ArXiv.

[11]  W. Agras,et al.  Training models for implementing evidence-based psychological treatment for college mental health: A cluster randomized trial study protocol. , 2018, Contemporary clinical trials.

[12]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[13]  Margot Mieskes,et al.  Preparing Data from Psychotherapy for Natural Language Processing , 2018, LREC.

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  L. Beutler,et al.  Principles of therapeutic change: a task force on participants, relationships, and techniques factors. , 2006, Journal of clinical psychology.

[16]  Hongfang Liu,et al.  CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines , 2017, J. Am. Medical Informatics Assoc..

[17]  Giovanni Pilato,et al.  Semantic Word Error Rate for Sentence Similarity , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[18]  Anthony P Morrison,et al.  The Lancet Psychiatry Commission on psychological treatments research in tomorrow's science. , 2018, Lancet psychiatry.

[19]  M. Ghassemi,et al.  Can AI Help Reduce Disparities in General Medical and Mental Health Care? , 2019, AMA journal of ethics.

[20]  Navdeep Jaitly,et al.  Speech recognition for medical conversations , 2017, INTERSPEECH.

[21]  Michael Levit,et al.  End-to-end speech recognition accuracy metric for voice-search tasks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Michael E Matheny,et al.  Artificial Intelligence in Health Care: A Report From the National Academy of Medicine. , 2019, JAMA.

[23]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[24]  D. Bolt,et al.  Therapist effects in psychotherapy: A random-effects modeling of the National Institute of Mental Health Treatment of Depression Collaborative Research Program data , 2006 .

[25]  Shrikanth S. Narayanan,et al.  "Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing , 2015, PloS one.

[26]  Adam S. Miner,et al.  Key Considerations for Incorporating Conversational AI in Psychotherapy , 2019, Front. Psychiatry.

[27]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[28]  David C. Atkins,et al.  Computational psychotherapy research: scaling up the evaluation of patient-provider interactions. , 2015, Psychotherapy.

[29]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[30]  C. Hill,et al.  How and Why Are Some Therapists Better Than Others Understanding Therapist Effects , 2017 .

[31]  R. Brendel A Clinical Guide to Psychiatric Ethics , 2016 .

[32]  Byron C. Wallace,et al.  Quantifying Mental Health from Social Media with Neural User Embeddings , 2017, MLHC.

[33]  G. Schwarzer,et al.  Comparative Efficacy and Acceptability of Pharmacological, Psychotherapeutic, and Combination Treatments in Adults With Posttraumatic Stress Disorder: A Network Meta-analysis. , 2019, JAMA psychiatry.

[34]  Todd R. Johnson,et al.  Retrofitting Word Vectors of MeSH Terms to Improve Semantic Similarity Measures , 2016, Louhi@EMNLP.

[35]  Abhishek Pandey,et al.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review , 2017, J. Biomed. Informatics.

[36]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[37]  Rachael Tatman,et al.  Gender and Dialect Bias in YouTube’s Automatic Captions , 2017, EthNLP@EACL.

[38]  Jonathan H. Chen,et al.  Machine Learning and Prediction in Medicine - Beyond the Peak of Inflated Expectations. , 2017, The New England journal of medicine.

[39]  Matthew Hutson,et al.  Has artificial intelligence become alchemy? , 2018, Science.

[40]  C. Marmar,et al.  Speech‐based markers for posttraumatic stress disorder in US veterans , 2019, Depression and anxiety.

[41]  Andrés E. Pérez-Rojas,et al.  "Alguien abrió la puerta:" The phenomenology of bilingual Latinx clients' use of Spanish and English in psychotherapy. , 2019, Psychotherapy.

[42]  B. Kaiser,et al.  The integration of idioms of distress into mental health assessments and interventions: a systematic review , 2019, Global Mental Health.

[43]  Ronan Cummins,et al.  Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning , 2019, JAMA psychiatry.

[44]  D. Blumenthal,et al.  Vital Signs: Core Metrics for Health and Health Care Progress. , 2015, Military medicine.

[45]  Jie Xu,et al.  The practical implementation of artificial intelligence technologies in medicine , 2019, Nature Medicine.

[46]  Dan Jurafsky,et al.  Racial disparities in automated speech recognition , 2020, Proceedings of the National Academy of Sciences.

[47]  Cara C. Lewis,et al.  Implementing Measurement-Based Care in Behavioral Health: A Review , 2019, JAMA psychiatry.

[48]  Rebecca J Bartlett Ellis,et al.  Building the case for actionable ethics in digital health research supported by artificial intelligence , 2019, BMC Medicine.

[49]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[50]  M. Barber,et al.  The Growing Regulation of Conversion Therapy. , 2016, Journal of medical regulation.

[51]  J. Norcross,et al.  Evidence-based therapy relationships: research conclusions and clinical practices. , 2011, Psychotherapy.

[52]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[53]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[54]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[55]  David W. Bates,et al.  Analysis of Errors in Dictated Clinical Documents Assisted by Speech Recognition Software and Professional Transcriptionists , 2018, JAMA network open.

[56]  I. Elkin,et al.  A Major Dilemma in Psychotherapy Outcome Research: Disentangling Therapists From Therapies , 1999 .

[57]  Alan E Kazdin,et al.  Addressing the treatment gap: A key challenge for extending evidence-based psychosocial interventions. , 2017, Behaviour research and therapy.

[58]  Yoshua Bengio,et al.  Embedding Word Similarity with Neural Machine Translation , 2014, ICLR.

[59]  John Torous,et al.  New tests, new tools: mobile and connected technologies in advancing psychiatric diagnosis , 2018, npj Digital Medicine.

[60]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[61]  Zac E. Imel,et al.  Introduction to the special section "Big'er' Data": Scaling up psychotherapy research in counseling psychology. , 2016, Journal of counseling psychology.

[62]  Tatsuya Kawahara,et al.  A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[63]  O. Gelo,et al.  Psychotherapy Research: Foundations, Process, and Outcome , 2015 .

[64]  O. Gelo,et al.  Text Analysis within Quantitative and Qualitative Psychotherapy Process Research: Introduction to Special Issue , 2013 .

[65]  John F. Hunter,et al.  Use of Digital Mental Health for Marginalized and Underserved Populations , 2019, Current Treatment Options in Psychiatry.

[66]  S. Goodman,et al.  Machine Learning, Health Disparities, and Causal Reasoning , 2018, Annals of Internal Medicine.

[67]  R. Spitzer,et al.  The PHQ-9: validity of a brief depression severity measure. , 2001, Journal of general internal medicine.

[68]  Megan R. Mahoney,et al.  Ten Ways Artificial Intelligence Will Transform Primary Care , 2019, Journal of General Internal Medicine.

[69]  Robert Sherrick,et al.  Who is at greatest risk for receiving poor-quality health care? , 2006, The New England journal of medicine.

[70]  M. Barkham,et al.  A systematic review of therapist effects: A critical narrative update and refinement to review. , 2019, Clinical psychology review.

[71]  T. Shanafelt,et al.  Reimagining Clinical Documentation With Artificial Intelligence. , 2018, Mayo Clinic proceedings.

[72]  Herbert S. Gross,et al.  Therapeutic Discourse: Psychotherapy as Conversation , 1978 .

[73]  Robert M Wachter,et al.  Artificial Intelligence in Health Care: Will the Value Match the Hype? , 2019, JAMA.

[74]  Anjuli Kannan,et al.  Automatically Charting Symptoms From Patient-Physician Conversations Using Machine Learning. , 2019, JAMA internal medicine.

[75]  Alan E. Kazdin,et al.  Novel Models for Delivering Mental Health Services and Reducing the Burdens of Mental Illness , 2013 .

[76]  R. DeRubeis,et al.  Achieving Successful Dissemination of Empirically Supported Psychotherapies: A Synthesis of Dissemination Theory , 2006 .

[77]  Sameer Singh,et al.  Detecting conversation topics in primary care office visits from transcripts of patient-provider interactions , 2019, J. Am. Medical Informatics Assoc..

[78]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[79]  R. Califf,et al.  Transforming Psychiatry into Data-Driven Medicine with Digital Measurement Tools , 2018, npj Digital Medicine.

[80]  Don E. Davis,et al.  Cultural humility and racial microaggressions in counseling. , 2016, Journal of counseling psychology.

[81]  James F. Boswell,et al.  Therapist effectiveness: Implications for accountability and patient care , 2011, Psychotherapy research : journal of the Society for Psychotherapy Research.

[82]  Andrej Ljolje,et al.  Predicting Human Perceived Accuracy of ASR Systems , 2011, INTERSPEECH.

[83]  Liqin Wang,et al.  Speech recognition for clinical documentation from 1990 to 2018: a systematic review , 2019, J. Am. Medical Informatics Assoc..

[84]  C. Rogers THE USE OF ELECTRICALLY RECORDED INTERVIEWS IN IMPROVING PSYCHOTHERAPEUTIC TECHNIQUES , 1942 .

[85]  P. A. Kelly,et al.  Racial differences in trust and lung cancer patients' perceptions of physician communication. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[86]  J. Vessey,et al.  Who seeks psychotherapy , 1993 .

[87]  Bridget C. O’Brien Do You See What I See? Reflections on the Relationship Between Transparency and Trust. , 2019, Academic medicine : journal of the Association of American Medical Colleges.

[88]  Guoyin Wang,et al.  Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms , 2018, ACL.

[89]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.