Special Issue: Text Mining and Information Analysis; Retrieving and Clustering Keywords in Neurosurgery Operation Reports Using Text Mining Techniques

Background: To develop a more practical and reasonable classification of surgical procedures, we applied text mining techniques to retrieve and categorize keywords in operation reports. Materials and Methods: Based on neurosurgical operation reports performed in a Taiwan medical center between 2009 and 2012, a corpus containing 3,657 documents was built. A total of 9,906 words were extracted. Initially, we applied term frequency-inverse document frequency (TF-IDF) weighting to automatically select pertinent keywords but the results were unsatisfactory. Then, we manually chose 45 keywords that belong to 3 categories: brain, spine and others. All documents were checked in an automated fashion for the presence of these keywords, producing a binary data matrix, which was used to compute the cosine similarity matrix. Then, we applied 6 variants of agglomerative clustering to build the dendrograms. Results: The document frequencies (DFs) of these 45 keywords ranged from 12 to 1,250, with an average of 444±342. The number of distinctive keywords per document ranged from 0-15, with an average of 5.5±2.5. The similarities between DF vectors are higher between keywords in the same category (brain or spine). The shortest link method and the unweighted pair-group method using the centroid (UPGMC) methods performed best on external and internal evaluation, respectively. Conclusion: The distributions of important keywords in neurosurgery operation reports reveal the localized nature of surgical procedures.

[1]  W. A. Newman Dorland,et al.  Dorland's Illustrated Medical Dictionary , 1974 .

[2]  Eneko Agirre,et al.  Exploiting domain information for Word Sense Disambiguation of medical documents , 2011, J. Am. Medical Informatics Assoc..

[3]  J. Farris On the Cophenetic Correlation Coefficient , 1969 .

[4]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[5]  Katherine P Andriole,et al.  Retrieval of Radiology Reports Citing Critical Findings with Disease-Specific Customization , 2012, The open medical informatics journal.

[6]  Anita Burgun-Parenthoine,et al.  Using regular expressions to extract information on pacemaker implantation procedures from clinical reports , 2008, AMIA.

[7]  Sanda M. Harabagiu,et al.  Automatic extraction of relations between medical concepts in clinical texts , 2011, J. Am. Medical Informatics Assoc..

[8]  Bao H. Do,et al.  Automatic Retrieval of Bone Fracture Knowledge Using Natural Language Processing , 2013, Journal of Digital Imaging.

[9]  William J Rudman,et al.  Healthcare fraud and abuse. , 2009, Perspectives in health information management.

[10]  Judith C. Wagner,et al.  Natural language generation of surgical procedures , 1999, Int. J. Medical Informatics.

[11]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[12]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[13]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[14]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[15]  Genevieve B Melton,et al.  Automated non-alphanumeric symbol resolution in clinical texts. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[16]  Geoffrey R. Norman,et al.  Biostatistics: The Bare Essentials , 1993 .

[17]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .