Importance Assessment in Scholarly Networks

We present approaches to estimate content-aware bibliometrics to quantitatively measure the scholarly impact of a publication. Traditional measures to assess quality-related aspects such as citation counts and h-index, do not take into account the content of the publications, which limits their ability to provide rigorous quality-related metrics and can significantly skew the results. Our proposed metric, denoted by Content Informed Index (CII), uses the content of the paper as a source of distant-supervision, to weight the edges of a citation network. These content-aware weights quantify the information in the citation i.e., these weights quantify the extent to which the cited-node informs the citing-node. The weights convert the original unweighted citation network to a weighted one. Consequently, this weighted network can be used to derive impact metrics for the various entities involved, like the publications, authors etc. We evaluate the weights estimated by our approach on three manually annotated datasets, where the annotations quantify the extent of information in the citation. Particularly, we evaluate how well the ranking imposed by our approach associates with the ranking imposed by the manual annotations. The proposed approach achieves up to 103% improvement in performance as compared to second best performing approach.

[1]  Mohit Sharma,et al.  Intent term selection and refinement in e-commerce queries , 2019, ArXiv.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[4]  L. A. Goodman,et al.  Measures of Association for Cross Classifications. II: Further Discussion and References , 1959 .

[5]  Oren Etzioni,et al.  Identifying Meaningful Citations , 2015, AAAI Workshop: Scholarly Big Data.

[6]  Gustavo Lannelongue,et al.  Scholarly Impact Revisited , 2012 .

[7]  Kyle Lo,et al.  S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.

[8]  Shuming Shi,et al.  hyperdoc2vec: Distributed Representations of Hypertext Documents , 2018, ACL.

[9]  Xiaojun Wan,et al.  Cross-language context-aware citation recommendation in scientific articles , 2014, SIGIR.

[10]  Susan T. Dumais,et al.  CiteSight: supporting contextual citation recommendation using differential search , 2014, SIGIR.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Jian Pei,et al.  Citation recommendation without author supervision , 2011, WSDM '11.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  George Karypis,et al.  Text Segmentation on Multilabel Documents: A Distant-Supervised Approach , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[15]  Wenyi Huang,et al.  Recommending citations: translating papers into references , 2012, CIKM.

[16]  Xiaoming Li,et al.  Personalized Citation Recommendation via Convolutional Neural Networks , 2017, APWeb/WAIM.

[17]  Christopher M. Danforth,et al.  An evolutionary algorithm approach to link prediction in dynamic social networks , 2013, J. Comput. Sci..

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Roger Newson,et al.  Parameters behind “Nonparametric” Statistics: Kendall's tau, Somers’ D and Median Differences , 2002 .

[20]  Michael Färber,et al.  Citation recommendation: approaches and datasets , 2020, International Journal on Digital Libraries.

[21]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[22]  Wenyi Huang,et al.  A Neural Probabilistic Model for Context Based Citation Recommendation , 2015, AAAI.

[23]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[24]  Yi Fang,et al.  Neural Citation Network for Context-Aware Citation Recommendation , 2017, SIGIR.

[25]  Analyzing the performance of top-k retrieval algorithms , .

[26]  Waleed Ammar,et al.  Structural Scaffolds for Citation Intent Classification in Scientific Publications , 2019, NAACL.

[27]  Jian-Yun Nie,et al.  Position-Aligned Translation Model for Citation Recommendation , 2012, SPIRE.

[28]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[29]  Daniel Kifer,et al.  Context-aware citation recommendation , 2010, WWW '10.

[30]  Susan T. Dumais,et al.  Partially labeled topic models for interpretable text mining , 2011, KDD.

[31]  George Karypis,et al.  Distant-Supervised Slot-Filling for E-Commerce Queries , 2020, ArXiv.

[32]  Prasenjit Mitra,et al.  Utilizing Context in Generative Bayesian Models for Linked Corpus , 2010, AAAI.

[33]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[34]  Daniel Jurafsky,et al.  Measuring the Evolution of a Scientific Field through Citation Frames , 2018, TACL.

[35]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[36]  Robert H. Somers,et al.  A new asymmetric measure of association for ordinal variables. , 1962 .

[37]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[38]  Achim G. Hoffmann,et al.  A New Approach for Scientific Citation Classification Using Cue Phrases , 2003, Australian Conference on Artificial Intelligence.

[39]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[40]  Wang-Chien Lee,et al.  CiteSeerx: an architecture and web service design for an academic document search engine , 2006, WWW '06.

[41]  Mohit Sharma,et al.  Intent Term Weighting in E-commerce Queries , 2019, CIKM.

[42]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[43]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[44]  Robert E. Mercer,et al.  Towards an Automated Citation Classifier , 2000, Canadian Conference on AI.

[45]  George Karypis,et al.  CAWA: An Attention-Network for Credit Attribution , 2020, AAAI.

[46]  L. A. Goodman,et al.  Measures of Association for Cross Classifications, IV: Simplification of Asymptotic Variances , 1972 .

[47]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[48]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[49]  L. A. Goodman,et al.  Measures of Association for Cross Classifications III: Approximate Sampling Theory , 1963 .

[50]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[51]  Zan Huang Link Prediction Based on Graph Topology: The Predictive Value of Generalized Clustering Coefficient , 2010 .

[52]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[53]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[54]  Fernando Berzal Galiano,et al.  A Survey of Link Prediction in Complex Networks , 2016, ACM Comput. Surv..

[55]  David J. Miller,et al.  Semisupervised, Multilabel, Multi-Instance Learning for Structured Data , 2017, Neural Computation.

[56]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[57]  Yuji Matsumoto,et al.  Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles , 2018, JCDL.