Relation Clustering in Narrative Knowledge Graphs

When coping with literary texts such as novels or short stories, the extraction of structured information in the form of a knowledge graph might be hindered by the huge number of possible relations between the entities corresponding to the characters in the novel and the consequent hurdles in gathering supervised information about them. Such issue is addressed here as an unsupervised task empowered by transformers: relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations. All the sentences in the same cluster are finally summarized (with BART) and a descriptive label extracted from the summary. Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.

[1]  Alessandro Antonucci,et al.  NOVEL2GRAPH: Visual Summaries of Narrative Text Enhanced by Machine Learning , 2019, Text2Story@ECIR.

[2]  Alessandro Antonucci,et al.  Temporal Word Embeddings for Narrative Understanding , 2020, ICMLC.

[3]  Chengsheng Mao,et al.  KG-BERT: BERT for Knowledge Graph Completion , 2019, ArXiv.

[4]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[5]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Minh-Triet Tran,et al.  News Classification from Social Media Using Twitter-based Doc2Vec Model and Automatic Query Expansion , 2017, SoICT.

[7]  Carl Vogel,et al.  Proceedings of the 16th International Conference on Computational Linguistics , 1996, COLING 1996.

[8]  Zhiyuan Tang,et al.  Recurrent neural network training with dark knowledge transfer , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[10]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[11]  Foreword and Editorial International Journal of Hybrid Information Technology , 2022 .

[12]  Matthew Short Text Mining and Subject Analysis for Fiction; or, Using Machine Learning and Information Extraction to Assign Subject Headings to Dime Novels , 2019 .

[13]  Alessandro Antonucci,et al.  Temporal Embeddings and Transformer Models for Narrative Text Understanding , 2020, Text2Story@ECIR.

[14]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[15]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[16]  Shuai Xu,et al.  Joint Visual-Textual Sentiment Analysis Based on Cross-Modality Attention Mechanism , 2018, MMM.

[17]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .

[18]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[19]  Dmitry I. Ilvovsky,et al.  Extracting Social Networks from Literary Text with Word Embedding Tools , 2016, LT4DH@COLING.

[20]  Ting Wang,et al.  An automatic approach for constructing a knowledge base of symptoms in Chinese , 2016, BIBM.

[21]  Chirag Shah,et al.  Towards automatic fake news classification , 2018 .

[22]  Jun Zhao,et al.  Knowledge Graph Embedding via Dynamic Mapping Matrix , 2015, ACL.

[23]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[24]  Emily M. Bender,et al.  Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018 , 2018, COLING.

[25]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[26]  Xiang Ren,et al.  KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning , 2019, EMNLP.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Jugal K. Kalita,et al.  Genre Identification and the Compositional Effect of Genre in Literature , 2018, COLING.

[29]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[30]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[31]  Vincent Labatut,et al.  Extraction and Analysis of Fictional Character Networks , 2019, ACM Comput. Surv..

[32]  Bin Chen,et al.  Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data , 2010, BMC Bioinformatics.

[33]  Zhe Zhao,et al.  K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[34]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[35]  Yijia Zhang,et al.  A hybrid model based on neural networks for biomedical relation extraction , 2018, J. Biomed. Informatics.

[36]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[37]  Ting Wang,et al.  An automatic approach for constructing a knowledge base of symptoms in Chinese , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).