Understanding patient complaint characteristics using contextual clinical BERT embeddings

In clinical conversational applications, extracted entities tend to capture the main subject of a patient’s com-plaint, namely symptoms or diseases. However, they mostly fail to recognize the characterizations of a complaint such as the time, the onset, and the severity. For example, if the input is "I have a headache and it is extreme", state-of-the-art models only recognize the main symptom entity - headache, but ignore the severity factor of extreme, that characterises headache. In this paper, we design a two-fold approach to detect the characterizations of entities like symptoms presented by general users in contexts where they would describe their symptoms to a clinician. We use Word2Vec and BERT models to encode clinical text given by the patients. We transform the output and re-frame the task as a multi-label classification problem. Finally, we combine the processed encodings with the Linear Discriminant Analysis (LDA) algorithm to classify the characterizations of the main entity. Experimental results demonstrate that our method achieves 40-50% improvement in the accuracy over the state-of-the-art models.