A Macro Discourse Primary and Secondary Relation Recognition Method Based on Topic Similarity

蒋峰,褚晓敏,徐昇,李培峰,朱巧明 (苏州大学计算机科学与技术学院,江苏 苏州 215006; 江苏省计算机信息技术处理重点实验室,江苏 苏州 215006) 摘要:篇章分析是自然语言处理领域的一个重要任务。分析篇章主次关系有助于理解篇章的结构和语义, 并为自然语言处理的应用提供有力的支持。本文在微观篇章主次关系识别研究的基础上,重点研究宏观篇 章主次关系,提出了一种基于 word2vec 和 LDA 的主题相似度的宏观篇章主次关系识别模型。基于 word2vec 的主题相似度和基于 LDA 的主题相似度在不同维度上计算语义相似度,两者在语义层面形成互补,因而增 强了模型识别宏观篇章主次关系的能力。该模型在宏观汉语篇章树库(MCDTB)上实验的F1值达到79.9%, 正确率达到 81.82%,相较基准系统分别提升了 1.7%和 1.81%。 关键词:宏观篇章主次关系;主题相似度;word2vec;LDA 中图分类号:TP391 文献标识码:A

[1]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[2]  이주연,et al.  Latent Dirichlet Allocation (LDA) 모델 기반의 인공지능(A.I.) 기술 관련 연구 활동 및 동향 분석 , 2018 .

[3]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[4]  Shafiq R. Joty,et al.  Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis , 2013, ACL.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Shafiq R. Joty,et al.  A Novel Discriminative Framework for Sentence-Level Discourse Analysis , 2012, EMNLP.

[7]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[8]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[9]  Guodong Zhou,et al.  Recognizing nuclearity between Chinese Discourse units , 2015, 2015 International Conference on Asian Language Processing (IALP).

[10]  Dietrich Rebholz-Schuhmann,et al.  A Discourse-Driven Content Model for Summarising Scientific Articles Evaluated in a Complex Question Answering Task , 2013, EMNLP.

[11]  Graeme Hirst,et al.  A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing , 2014, ACL.

[12]  Guodong Zhou,et al.  Negation Focus Identification with Contextual Discourse Information , 2014, ACL.

[13]  V. Dijk,et al.  Macrostructures , 2019 .

[14]  Nazli Goharian,et al.  Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure , 2015, EMNLP.