Automated classification of cardiology diagnoses based on textual medical reports

Automatic diagnoses of diseases has been a long term challenge for Computer Science and related disciplines. Textual clinical reports can be used as a great source of data for such diagnoses. However, building classification models from them is not a trivial task. The problem tackled in this work is the identification of the medical diagnoses that are indicated in these reports. In the past, several methods have been proposed for addressing this problem, but a method developed for reports in the cardiology area that are written in Portuguese is still needed. In this paper we describe a method that is able to handle the peculiarities of clinical reports, including the medical terminology, and that is implemented to estimate correctly the disease based on raw clinical reports and a list of the possible diagnoses. Experimental results show that our method has a high degree of accuracy, even for infrequent classes and complex databases.

[1]  A. Worster,et al.  Understanding receiver operating characteristic (ROC) curves. , 2006, CJEM.

[2]  Xiaofei Wang,et al.  A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[3]  Elmer R. Gabrieli,et al.  Automated analysis of medical text I. Clue gathering , 2004, Journal of Medical Systems.

[4]  Clement J. McDonald,et al.  Automated Extraction and Normalization of Findings from Cancer-Related Free-Text Radiology Reports , 2003, AMIA.

[5]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[6]  Rajesh Wadhvani,et al.  A Review on Text Similarity Technique used in IR and its Application , 2015 .

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[9]  Ronghang Hu,et al.  Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer , 2021, ArXiv.

[10]  Thomas B. Schön,et al.  Automatic diagnosis of the 12-lead ECG using a deep neural network , 2020, Nature Communications.

[11]  Carol Friedman,et al.  Towards a comprehensive medical language processing system: methods and issues , 1997, AMIA.

[12]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13]  Stephen J. Roberts,et al.  Markov Models for Automated ECG Interval Analysis , 2003, NIPS.

[14]  George Hripcsak,et al.  Natural language processing in an operational clinical information system , 1995, Natural Language Engineering.

[15]  Krys J. Kochut,et al.  A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques , 2017, ArXiv.

[16]  Mathieu Roche,et al.  Information Retrieval in Biomedicine - Natural Language Processing for Knowledge Integration , 2009, Information Retrieval in Biomedicine.

[17]  Rob Koeling,et al.  Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text? , 2013, BMC Medical Research Methodology.

[18]  Gisele L. Pappa,et al.  An Evolutionary Methodology for Handling Data Scarcity and Noise in Monitoring Real Events from Social Media Data , 2014, IBERAMIA.

[19]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[20]  M. B. Alkmim,et al.  Improving patient access to specialized health care: the Telehealth Network of Minas Gerais, Brazil. , 2012, Bulletin of the World Health Organization.

[21]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[22]  Keith J Dreyer,et al.  Natural language processing using online analytic processing for assessing recommendations in radiology reports. , 2008, Journal of the American College of Radiology : JACR.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Saeed Hassanpour,et al.  Artificial Intelligence in Medicine , 2015 .

[27]  Hong Yu,et al.  Structured prediction models for RNN based sequence labeling in clinical text , 2016, EMNLP.

[28]  Yu Zhang,et al.  Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.

[29]  Perry L. Miller,et al.  Research Paper: Exploring the Degree of Concordance of Coded and Textual Data in Answering Clinical Queries from a Clinical Data Repository , 2000, J. Am. Medical Informatics Assoc..

[30]  Wagner Meira Jr.,et al.  Automated classification of cardiology diagnoses based on textual medical reports , 2021, Journal of Information and Data Management.

[32]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[33]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[34]  J R Scherrer,et al.  Natural Language Processing and Semantical Representation of Medical Texts , 1992, Methods of Information in Medicine.

[35]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.