Modeling Citation Networks Using Latent Random Offsets

Out of the many potential factors that determine which links form in a document citation network, two in particular are of high importance: first, a document may be cited based on its subject matter—this can be modeled by analyzing document content; second, a document may be cited based on which other documents have previously cited it—this can be modeled by analyzing citation structure. Both factors are important for users to make informed decisions and choose appropriate citations as the network grows. In this paper, we present a novel model that integrates the merits of content and citation analyses into a single probabilistic framework. We demonstrate our model on three real-world citation networks. Compared with existing baselines, our model can be used to effectively explore a citation network and provide meaningful explanations for links while still maintaining competitive citation prediction performance.

[1]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[2]  Taeshik Shon,et al.  A hybrid machine learning approach to network anomaly detection , 2007, Inf. Sci..

[3]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[4]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[5]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[6]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[8]  Ramesh Nallapati,et al.  TopicFlow Model: Unsupervised Learning of Topic-specific Influences of Hyperlinked Documents , 2011, AISTATS.

[9]  Daniel Jurafsky,et al.  Who should I cite: learning literature search models from citation behavior , 2010, CIKM.

[10]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[11]  David M. Blei,et al.  Visualizing Topic Models , 2012, ICWSM.

[12]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[13]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[14]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[15]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[16]  C. Elkan,et al.  Topic Models , 2008 .

[17]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[18]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[19]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[20]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[21]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Eric P. Xing,et al.  Document hierarchies from text and links , 2012, WWW.

[24]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.