Information Extraction from Cancer Pathology Reports with Graph Convolution Networks for Natural Language Texts

Graph-of-words is a flexible and efficient text representation which addresses well-known challenges, such as word ordering and variation of expressions, to natural language processing. In this paper, we consider the latest graph-based convolutional neural network technique, the Text GraphConvolutional Network (Text GCN), in the context of performingclassification tasks on free-form natural language texts. To do this, we designed a study of multi-task information extraction from medical text documents. We implemented multi-task learning in the Text GCN, performed hyperparameter optimization, and measured the clinical task performances. We evaluated micro and macro-F1 scores of four information extraction tasks,including subsite, laterality, behavior, and histological grades from cancer pathology reports. The scores for the Text GCN significantly outperformed our previous studies with convolutional neural networks, suggesting that the Text GCN model is superior to traditional models in task performance.

[1]  Shang Gao,et al.  Hierarchical Convolutional Attention Networks for Text Classification , 2018, Rep4NLP@ACL.

[2]  Hong-Jun Yoon,et al.  Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports , 2018, IEEE Journal of Biomedical and Health Informatics.

[3]  Fangfang Xia,et al.  CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research , 2018, BMC Bioinformatics.

[4]  Alexander J. Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[5]  E. Arias,et al.  Mortality in the United States, 2017. , 2018, NCHS data brief.

[6]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[7]  Hong-Jun Yoon,et al.  Automated histologic grading from free-text pathology reports using graph-of-words features and machine learning , 2017, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[8]  Michalis Vazirgiannis,et al.  Graph-of-word and TW-IDF: new approach to ad hoc IR , 2013, CIKM.

[9]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[10]  Hong-Jun Yoon,et al.  Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports , 2018, 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[11]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[12]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[13]  Hong-Jun Yoon,et al.  Model-based Hyperparameter Optimization of Convolutional Neural Networks for Information Extraction from Cancer Pathology Reports on HPC , 2019, 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[14]  Matthew J Hayat,et al.  Cancer statistics, trends, and multiple primary cancer analyses from the Surveillance, Epidemiology, and End Results (SEER) Program. , 2007, The oncologist.