MediBERT: A Medical Chatbot Built Using KeyBERT, BioBERT and GPT-2

The emergence of chatbots over the last 50 years has been the primary consequence of the need of a virtual aid. Unlike their biological anthropomorphic counterpart in the form of fellow homo sapiens, chatbots have the ability to instantaneously present themselves at the user's need and convenience. Be it for something as benign as feeling the need of a friend to talk to, to a more dire case such as medical assistance, chatbots are unequivocally ubiquitous in their utility. This paper aims to develop one such chatbot that is capable of not only analyzing human text (and speech in the near future), but also refining the ability to assist them medically through the process of accumulating data from relevant datasets. Although Recurrent Neural Networks (RNNs) are often used to develop chatbots, the constant presence of the vanishing gradient issue brought about by backpropagation, coupled with the cumbersome process of sequentially parsing each word individually has led to the increased usage of Transformer Neural Networks (TNNs) instead, which parses entire sentences at once while simultaneously giving context to it via embeddings, leading to increased parallelization. Two variants of the TNN Bidirectional Encoder Representations from Transformers (BERT), namely KeyBERT and BioBERT, are used for tagging the keywords in each sentence and for contextual vectorization into Q/A pairs for matrix multiplication, respectively. A final layer of GPT-2 (Generative Pre-trained Transformer) is applied to fine-tune the results from the BioBERT into a form that is human readable. The outcome of such an attempt could potentially lessen the need for trips to the nearest physician, and the temporal delay and financial resources required to do so.

[1]  M. Suchithra,et al.  Application of Abstractive Summarization in Multiple Choice Question Generation , 2022, 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES).

[2]  Henry Nunoo-Mensah,et al.  Transformer models for text-based emotion detection: a review of BERT-based approaches , 2021, Artificial Intelligence Review.

[3]  A. Awelewa,et al.  Development of a Voice Chatbot for Payment Using Amazon Lex Service with Eyowo as the Payment Platform , 2020, 2020 International Conference on Decision Aid Sciences and Application (DASA).

[4]  Vijay Mago,et al.  TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis , 2020, ArXiv.

[5]  Jaouhar Fattahi,et al.  SpaML: a Bimodal Ensemble Learning Spam Detector based on NLP Techniques , 2020, 2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP).

[6]  A. Maalel,et al.  Smart Ubiquitous Chatbot for COVID-19 Assistance with Deep learning Sentiment Analysis Model during and after quarantine , 2020 .

[7]  Chaitra Hegde,et al.  Unsupervised Paraphrase Generation using Pre-trained Language Models , 2020, ArXiv.

[8]  Xueying Yu,et al.  LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example , 2020, 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI).

[9]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[10]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[11]  Lefteris Moussiades,et al.  An Overview of Chatbot Technology , 2020, AIAI.

[12]  Ming Tang,et al.  Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification , 2020, Neurocomputing.

[13]  Boudhir Anouar Abdelhakim,et al.  A Smart Chatbot Architecture based NLP and Machine Learning for Health Care Assistance , 2020, NISS.

[14]  Jiamou Liu,et al.  HHH: An Online Medical Chatbot System based on Knowledge Graph and Hierarchical Bi-Directional Attention , 2020, ACSW.

[15]  Hermann Ney,et al.  A Comparison of Transformer and LSTM Encoder Decoder Models for ASR , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[16]  Bi-Lingual (English, Punjabi) Sarcastic Sentiment Analysis by using Classification Methods , 2019, International Journal of Innovative Technology and Exploring Engineering.

[17]  Yong Yu,et al.  A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures , 2019, Neural Computation.

[18]  Steven R. Rick,et al.  SleepBot , 2019, Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion.

[19]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[20]  Kenneth Benoit,et al.  Fast, Consistent Tokenization of Natural Language Text , 2018, J. Open Source Softw..

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  Fei Shen,et al.  Machine Health Monitoring Using Local Feature-Based Gated Recurrent Unit Networks , 2018, IEEE Transactions on Industrial Electronics.

[23]  Shih-Chii Liu,et al.  Overcoming the vanishing gradient problem in plain recurrent networks , 2018, ArXiv.

[24]  Alexander I. Rudnicky,et al.  RubyStar: A Non-Task-Oriented Mixture Model Dialog System , 2017, ArXiv.

[25]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[26]  Ming Zhou,et al.  SuperAgent: A Customer Service Chatbot for E-commerce Websites , 2017, ACL.

[27]  Ruofei Zhang,et al.  DeepProbe: Information Directed Sequence Understanding and Chatbot Design via Recurrent Neural Networks , 2017, KDD.

[28]  Wei Chu,et al.  AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine , 2017, ACL.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[31]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[32]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[33]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[37]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[38]  Lokesh Kumar,et al.  TEXT MINING: CONCEPTS, PROCESS AND APPLICATIONS , 2013 .

[39]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[40]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[41]  J. Epstein,et al.  From Eliza to Internet: a brief history of computerized assessment , 2001, Comput. Hum. Behav..

[42]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[43]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[44]  Robert L. K. Tiong,et al.  Evaluation of proposals for BOT projects , 1997 .

[45]  Michael Spann,et al.  Texture feature performance for image segmentation , 1990, Pattern Recognit..

[46]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[47]  P. Buttery,et al.  Towards an open-domain chatbot for language practice , 2022, BEA.

[48]  Navin Sabharwal,et al.  Introduction to Google Dialogflow , 2020 .

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  Kohei Arai,et al.  Proceedings of the Future Technologies Conference (FTC) 2019 , 2019, Advances in Intelligent Systems and Computing.

[51]  Zhengtao Yu,et al.  Domain-Specific Chinese Word Segmentation Based on Bi-Directional Long-Short Term Memory Model , 2019, IEEE Access.

[52]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[53]  Stefano Ferilli,et al.  Automatic Learning of Linguistic Resources for Stopword Removal and Stemming from Text , 2014, IRCDL.

[54]  Richard S. Wallace,et al.  The Anatomy of A.L.I.C.E. , 2009 .

[55]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[56]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[57]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.