Pathology text mining - on Norwegian prostate cancer reports

Pathology reports are written by pathologists, skilled physicians, that know how to interpret disorders in various tissue samples from the human body. To obtain valuable statistics on outcome of disorders, as for example cancer and effect of treatment, statistics are collected. Therefore, cancer pathology reports interpreted and coded into databases at cancer registries. In Norway is this task carried out by the Cancer Registry of Norway (Kreftregisteret) by 25 different human coders. There is a need to automate this process. The authors of this article received 25 prostate cancer pathology reports written in Norwegian from the Cancer Registry of Norway, each documenting various stages of prostate cancer and the corresponding correct manual coding. A rule-based algorithm was produced that processed the reports in order to prototype automation. The output of the algorithm was compared to the output of the manual coding. The evaluation showed an average F-Score of 0.94 on four of these data points namely Total Malign, Primary Gleason, Secondary Gleason and Total Gleason and a lower result with on average F-score of 0.76 on all ten data points. The results are in line with previous research.

[1]  Clement J. McDonald,et al.  Extracting Structured Information from Free Text Pathology Reports , 2003, AMIA.

[2]  Anthony N. Nguyen,et al.  Symbolic rule-based classification of lung cancer stages from free-text pathology reports , 2010, J. Am. Medical Informatics Assoc..

[3]  Jon Patrick,et al.  Automatic population of structured reports from narrative pathology reports , 2014 .

[4]  John Liu,et al.  Automated Extraction of Free-Text from Pathology Reports , 2006, AMIA.

[5]  Rebecka Weegar,et al.  Creating a rule based system for text mining of Norwegian breast cancer pathology reports , 2015, Louhi@EMNLP.

[6]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[7]  Hercules Dalianis,et al.  Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications , 2014, Professional Search in the Modern World.

[8]  Goran Nenadic,et al.  Text mining of cancer-related information: Review of current status and future directions , 2014, Int. J. Medical Informatics.

[9]  R Montironi,et al.  Gleason grading of prostate cancer. Contemporary approach. , 2005, Pathologica.

[10]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[11]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[12]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.