Leveraging citation influences for Modeling scientific documents

This paper studies a link-text algorithm to model scientific documents by citation influences, which is applied to document clustering and influence prediction. Most existing link-text algorithms ignore the different weights of citation influences that cited documents have on the corresponding citing document. In fact, citation influences reveal the latent structure of citation networks which is more accurate to describe the knowledge flow than the original citation structure. In this study, a citation influence is modeled as a weight of linear combination that approximates the text of a document by the content of its citations. Then, we present a novel matrix factorization algorithm, called Citation-Influences-Text Nonnegative Matrix Factorization (CIT-NMF), which incorporates text and citations to obtain better document representations by learning influence weights. In addition, an efficient optimization method is derived to solve the optimization problem. Experimental results on several real datasets show satisfactory improvements over the baseline models.

[1]  Vikram Pudi,et al.  Paper2vec: Combining Graph and Text Information for Scientific Paper Representation , 2017, ECIR.

[2]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Daniel Lemire,et al.  Measuring academic influence: Not all citations are equal , 2015, J. Assoc. Inf. Sci. Technol..

[5]  Xinbing Wang,et al.  Modeling Topic-Level Academic Influence in Scientific Literatures , 2016, AAAI Workshop: Scholarly Big Data.

[6]  Haesun Park,et al.  Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework , 2014, J. Glob. Optim..

[7]  Cheng-Te Li,et al.  Team formation with influence maximization for influential event organization on social networks , 2018, World Wide Web.

[8]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[9]  Qinghua Hu,et al.  Generalized Latent Multi-View Subspace Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[11]  Ning Chen,et al.  Discriminative Relational Topic Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[13]  Koh Takeuchi,et al.  Non-negative Multiple Tensor Factorization , 2013, 2013 IEEE 13th International Conference on Data Mining.

[14]  Ramesh Nallapati,et al.  TopicFlow Model: Unsupervised Learning of Topic-specific Influences of Hyperlinked Documents , 2011, AISTATS.

[15]  Susan Bonzi,et al.  Motivations for citation: A comparison of self citation and citation to others , 1991, Scientometrics.

[16]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[17]  Koh Takeuchi,et al.  Non-Negative Multiple Matrix Factorization , 2013, IJCAI.

[18]  Cristopher Moore,et al.  Scalable text and link analysis with mixed-topic link models , 2013, KDD.

[19]  Jiawei Han,et al.  Learning influence from heterogeneous social networks , 2012, Data Mining and Knowledge Discovery.

[20]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[21]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[22]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[23]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[24]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[25]  Wu-Jun Li,et al.  Relation regularized matrix factorization , 2009, IJCAI 2009.

[26]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[27]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[28]  Xinda Zeng,et al.  Discovering Context-aware Influential Objects , 2012, SDM.

[29]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[30]  Chuan Hu,et al.  Detecting Influence Relationships from Graphs , 2014, SDM.

[31]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[32]  Zenglin Xu,et al.  Auto-weighted multi-view clustering via deep matrix decomposition , 2020, Pattern Recognit..

[33]  Nicola Barbieri,et al.  Topic-aware social influence propagation models , 2012, Knowledge and Information Systems.

[34]  Wenyi Huang,et al.  Recommending citations: translating papers into references , 2012, CIKM.

[35]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[36]  Jiawei Han,et al.  Mining topic-level influence in heterogeneous networks , 2010, CIKM.

[37]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[38]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Philip S. Yu,et al.  SemRec: a personalized semantic recommendation method based on weighted heterogeneous information networks , 2018, World Wide Web.

[40]  Jing Gao,et al.  Clustered SVD strategies in latent semantic indexing , 2005, Inf. Process. Manag..

[41]  Luis Gravano,et al.  Predicting the impact of scientific concepts using full‐text features , 2016, J. Assoc. Inf. Sci. Technol..

[42]  Chuan Hu,et al.  Aspect-Level Influence Discovery from Graphs , 2016, IEEE Transactions on Knowledge and Data Engineering.

[43]  Ichiro Sakata,et al.  Detecting trends in academic research from a citation network using network representation learning , 2018, PloS one.

[44]  Wray L. Buntine,et al.  Bibliographic Analysis with the Citation Network Topic Model , 2016, ACML.

[45]  Cornelia Caragea,et al.  Extracting Keyphrases from Research Papers Using Citation Networks , 2014, AAAI.

[46]  Wei Chen,et al.  Influence diffusion dynamics and influence maximization in social networks with friend and foe relationships , 2011, WSDM.