Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery

Prostate cancer removal surgeries result in tumor found at the surgical margin, otherwise known as a positive surgical margin, have a significantly higher chance of biochemical recurrence and clinical progression. To support clinical outcomes assessment a system was designed to automatically identify, extract, and classify key phrases from pathology reports describing this outcome. Heuristics and boundary detection were used to extract phrases. Phrases were then classified using support vector machines into one of three classes: 'positive (involved) margins,' 'negative (uninvolved) margins,' and 'not-applicable or definitive.' A total of 851 key phrases were extracted from a sample of 782 reports produced between 1996 and 2006 from two major hospitals. Despite differences in reporting style, at least 1 sentence containing a diagnosis was extracted from 780 of the 782 reports (99.74%). Of the 851 sentences extracted, 97.3% contained diagnoses. Overall accuracy of automated classification of extracted sentences into the three categories was 97.18%.

[1]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[2]  Ricky K. Taira,et al.  Identifying Anatomical Phrases in Clinical Reports by Shallow Semantic Parsing Methods , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[3]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[4]  Michael W Kattan,et al.  Cancer control with radical prostatectomy alone in 1,000 consecutive patients. , 2002, The Journal of urology.

[5]  Carol Friedman,et al.  Facilitating Cancer Research using Natural Language Processing of Pathology Reports , 2004, MedInfo.

[6]  Wendy W. Chapman,et al.  Identifying Respiratory Findings in Emergency Department Reports for Biosurveillance using MetaMap , 2004, MedInfo.

[7]  J P Richie Management of patients with positive surgical margins following radical prostatectomy. , 1994, The Urologic clinics of North America.

[8]  Richard W. Grant,et al.  Case Report: Using Regular Expressions to Abstract Blood Pressure and Treatment Intensification Information from the Text of Physician Notes , 2006, J. Am. Medical Informatics Assoc..

[9]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[10]  Peter J. Haug,et al.  A Comparison of Classification Algorithms to Automatically Identify Chest X-Ray Reports That Support Pneumonia , 2001, J. Biomed. Informatics.

[11]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[12]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[13]  J. Austin,et al.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. , 2002, Radiology.

[14]  Wanda Pratt,et al.  The Effect of Feature Representation on MEDLINE Document Classification , 2005, AMIA.