A Novel Neural Approach for News Reprint Prediction

News media has become a prevalent information spreading platform, where news sites can reprint news from other sites. To better understand the mechanism of news propagation, it is necessary to model reprint behavior and predict whether a news site will reprint a piece of news. Most existing works in news reprint analysis focus on analyzing the semantic of news content, little work has been done on integrating reprint relationship among sites and news content for reprint prediction from the perspective of sites. The challenge of improving prediction performance lies in how to effectively incorporate these two kinds of information to learn a more comprehensive reprint behavior model. In this paper, we propose an Integrated Neural Reprint Prediction (INRP) model that considers both reprint relationship and news content. It models the reprint relationships as a directed weighted graph and maps it into a latent space to learn sites representations. During news content modeling process, sites representations are embedded as attention guidance to build up more site-specific content representations. Finally, sites and news representations are jointly modeled to predict whether a piece of news will be reprinted by a site. We empirically evaluate the performance of the proposed model on a real world dataset. Experimental results show that taking both the reprint relationship and news content information into consideration could allow us make more accurate analysis of reprint patterns. The mined patterns could serve as a feedback channel for both corporations and management departments.

[1]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[2]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[3]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[4]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[5]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[6]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[7]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[8]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[9]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[10]  Daniel Dajun Zeng,et al.  A Novel Embedding Method for News Diffusion Prediction , 2018, AAAI.

[11]  Jason Weston,et al.  Open Question Answering with Weakly Supervised Embedding Models , 2014, ECML/PKDD.

[12]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[13]  Huan Liu,et al.  Attributed Network Embedding for Learning in a Dynamic Environment , 2017, CIKM.

[14]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[15]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[16]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[17]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Chutisant Kerdvibulvech A Study of Interactive Digital Multimedia Applications , 2015, PCM.

[21]  Zhiyuan Liu,et al.  CANE: Context-Aware Network Embedding for Relation Modeling , 2017, ACL.

[22]  Daniel Dajun Zeng,et al.  Integrating Deep Learning Approaches for Identifying News Reprint Relation , 2018, IEEE Access.

[23]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[24]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25]  Minmin Chen,et al.  Efficient Vector Representation for Documents through Corruption , 2017, ICLR.

[26]  Emmanuel Müller,et al.  VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.