Citation recommendation without author supervision

Automatic recommendation of citations for a manuscript is highly valuable for scholarly activities since it can substantially improve the efficiency and quality of literature search. The prior techniques placed a considerable burden on users, who were required to provide a representative bibliography or to mark passages where citations are needed. In this paper we present a system that considerably reduces this burden: a user simply inputs a query manuscript (without a bibliography) and our system automatically finds locations where citations are needed. We show that naïve approaches do not work well due to massive noise in the document corpus. We produce a successful approach by carefully examining the relevance between segments in a query manuscript and the representative segments extracted from a document corpus. An extensive empirical evaluation using the CiteSeerX data set shows that our approach is effective.

[1]  D. Simon Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches , 2006 .

[2]  Thorsten Joachims,et al.  Citation Classification And Its Applications , 2005 .

[3]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[4]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5]  William W. Cohen,et al.  Recommendation : A Study in Combining Multiple Information Sources , 2007 .

[6]  W. Bruce Croft,et al.  Recommending citations for academic papers , 2007, SIGIR.

[7]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[9]  Hiep Phuc Luong,et al.  Concept-Based Document Recommendations for CiteSeer Authors , 2008, AH.

[10]  R. Durrett Probability: Theory and Examples , 1993 .

[11]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[12]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Prasenjit Mitra,et al.  Utilizing Context in Generative Bayesian Models for Linked Corpus , 2010, AAAI.

[15]  Anna Ritchie,et al.  Citation context analysis for information retrieval , 2009 .

[16]  Fan Wang,et al.  A Survey on Reviewer Assignment Problem , 2008, IEA/AIE.

[17]  Thorsten Joachims,et al.  Identifying the original contribution of a document via language modeling , 2009, ECML/PKDD.

[18]  Sean M. McNee,et al.  On the recommending of citations for research papers , 2002, CSCW '02.

[19]  Berthier A. Ribeiro-Neto,et al.  Impedance coupling in content-targeted advertising , 2005, SIGIR '05.

[20]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[21]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[22]  Sean M. McNee,et al.  Enhancing digital libraries with TechLens , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[23]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[24]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[25]  Wei-Ying Ma,et al.  TSSP: A Reinforcement Algorithm to Find Related Papers , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[26]  Andrei Z. Broder,et al.  A semantic approach to contextual advertising , 2007, SIGIR.

[27]  Daniel Kifer,et al.  Context-aware citation recommendation , 2010, WWW '10.

[28]  Shenghuo Zhu,et al.  Learning multiple graphs for document recommendations , 2008, WWW.

[29]  Reiner Kraft,et al.  Leveraging context in user-centric entity detection systems , 2007, CIKM '07.

[30]  Jie Tang,et al.  A Discriminative Approach to Topic-Based Citation Recommendation , 2009, PAKDD.

[31]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.