Application of Recently Developed Computer Algorithm for Automatic Classification of Unstructured Radiology Reports: Validation Study 1

PURPOSE To validate the accuracy of Lexicon Mediated Entropy Reduction (LEXIMER), a new information theory-based computer algorithm developed by the authors for independent analysis and classification of unstructured radiology reports based on the presence of clinically important findings (F(T), where (T) represents "true") and recommendations for subsequent action (R(T)). MATERIALS AND METHODS The study was approved by the Human Research Committee of the institutional review board. Consecutive de-identified radiology reports (n = 1059) comprising results of barium studies (n = 99), computed tomography (n = 107), mammography (n = 90), magnetic resonance imaging (n = 108), nuclear medicine (n = 99), positron emission tomography (n = 106), radiography (n = 212), ultrasonography (n = 131), and vascular procedures (n = 107) were independently analyzed by two radiologists and then with LEXIMER to categorize the reports into F(T) and F(T)0 (containing or not containing clinically important findings) categories and R(T) and R(T)0 (containing or not containing recommendations for subsequent action) categories. Accuracy, sensitivity, specificity, and positive and negative predictive values of LEXIMER for placing reports into F(T) and F(T)0 and R(T) and R(T)0 categories were assessed by using appropriate statistical tests. RESULTS There was strong interobserver concordance between the two radiologists for placing radiology reports into F(T) and R(T) categories (kappa = 0.9, P < .01). For the LEXIMER program, accuracy, sensitivity, specificity, and positive and negative predictive values, respectively, were 97.5% (95% confidence interval [CI]: 96.6%, 98.5%), 98.9% (95% CI: 97.9%, 99.6%), 94.9% (95% CI: 93.1%, 96.0%), 97.5% (95% CI: 96.6%, 98.0%), and 97.7% (95% CI: 95.8%, 98.8%) for placing radiology reports into F(T) and F(T)0 categories and 99.6% (95% CI: 99.2%, 99.9%), 98.2% (95% CI: 95.0%, 99.6%), 99.9% (95% CI: 99.4%, 99.99%), 99.4% (95% CI: 96.3%, 99.9%), and 99.7% (95% CI: 98.9%, 99.9%) for placing reports into R(T) and R(T)0 categories. CONCLUSION LEXIMER is an accurate automated engine for evaluating the percentage positivity of clinically important findings and rates of recommendation for subsequent action in unstructured radiology reports.

[1]  J. Sunshine,et al.  The effect of imaging guidelines on the number and quality of outpatient radiographic examinations. , 2000, AJR. American journal of roentgenology.

[2]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[3]  Valerie P Jackson,et al.  Assessing radiology resident reporting skills. , 2002, Radiology.

[4]  N. T. Cheung,et al.  Structured Data Entry of Clinical Information for Documentation and Data Collection , 2001, MedInfo.

[5]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[6]  Anita Burgun-Parenthoine,et al.  Automatic concept extraction from spoken medical reports , 2003, Int. J. Medical Informatics.

[7]  J. Austin,et al.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. , 2002, Radiology.

[8]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[9]  L A Lenert,et al.  Monitoring free-text data using medical language processing. , 1993, Computers and biomedical research, an international journal.

[10]  Curtis P Langlotz,et al.  Automatic structuring of radiology reports: harbinger of a second information revolution in radiology. , 2002, Radiology.

[11]  Curtis P. Langlotz,et al.  Enhancing the expressiveness and usability of structured image reporting systems , 2000, AMIA.

[12]  Henry J. Lowe,et al.  Selective Automated Indexing of Findings and Diagnoses in Radiology Reports , 2001, J. Biomed. Informatics.

[13]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[14]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[15]  Alexander Borst,et al.  Information theory and neural coding , 1999, Nature Neuroscience.

[16]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..