论文信息 - Biomedical Document Classification with Literature Graph Representations of Bibliographies and Entities

Biomedical Document Classification with Literature Graph Representations of Bibliographies and Entities

This paper proposes a new document classification method that incorporates the representations of a literature graph created from bibliographic and entity information.Recently, document classification performance has been significantly improved with large pre-trained language models; however, there still remain documents that are difficult to classify. External information, such as bibliographic information, citation links, descriptions of entities, and medical taxonomies, has been considered one of the keys to dealing with such documents in document classification. Although several document classification methods using external information have been proposed, they only consider limited relationships, e.g., word co-occurrence and citation relationships. However, there are multiple types of external information.To overcome the limitation of the conventional use of external information, we propose a document classification model that simultaneously considers bibliographic and entity information to deeply model the relationships among documents using the representations of the literature graph.The experimental results show that our proposed method outperforms existing methods on two document classification datasets in the biomedical domain with the help of the literature graph.

Yutaka Sasaki | Makoto Miwa | Ryuki Ida

[1] J. Leskovec,et al. LinkBERT: Pretraining Language Models with Document Links , 2022, ACL.

[2] Jiwei Li,et al. BertGCN: Transductive Text Classification by Combining GNN and BERT , 2021, FINDINGS.

[3] Zexuan Zhong,et al. A Frustratingly Easy Approach for Joint Entity and Relation Extraction , 2020, ArXiv.

[4] William L. Hamilton. Graph Representation Learning , 2020, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[5] Jianfeng Gao,et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[6] G. Karypis,et al. DGL-KE: Training Knowledge Graph Embeddings at Scale , 2020, SIGIR.

[7] J. Guillaume,et al. [PubMed]. , 2020, Annales de dermatologie et de venereologie.

[8] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[9] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[10] Daisuke Miyazaki,et al. Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data , 2018, ArXiv.

[11] Jian-Yun Nie,et al. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , 2018, ICLR.