Context sensitive article ranking with citation context analysis

It is hard to detect important articles in a specific context. Information retrieval techniques based on full text search can be inaccurate to identify main topics and they are not able to provide an indication about the importance of the article. Generating a citation network is a good way to find most popular articles but this approach is not context aware. The text around a citation mark is generally a good summary of the referred article. So citation context analysis presents an opportunity to use the wisdom of crowd for detecting important articles in a context sensitive way. In this work, we analyze citation contexts to rank articles properly for a given topic. The model proposed uses citation contexts in order to create a directed and edge-labeled citation network based on the target topic. Then we apply common ranking algorithms in order to find important articles in this newly created network. We showed that this method successfully detects a good subset of most prominent articles in a given topic. The biggest contribution of this approach is that we are able to identify important articles for a given search term even though these articles do not contain this search term. This technique can be used in other linked documents including web pages, legal documents, and patents as well as scientific papers.

[1]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[2]  Shannon Bradshaw,et al.  Reference Directed Indexing: Redeeming Relevance for Subject Search in Citation Indexes , 2003, ECDL.

[3]  Madian Khabsa,et al.  SeerSuite: Developing a Scalable and Reliable Application Framework for Building Digital Libraries by Crawling the Web , 2010, WebApps.

[4]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Ishiuchi,et al.  Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas , 2004 .

[7]  Simone Teufel,et al.  How to Find Better Index Terms Through Citations , 2006 .

[8]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[9]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[10]  Marti A. Hearst,et al.  Summarizing Key Concepts using Citation Sentences , 2006, BioNLP@NAACL-HLT.

[11]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[12]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[13]  Master Gardener,et al.  Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .

[14]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[16]  Jöran Beel,et al.  Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis , 2009 .

[17]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[18]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[19]  M. Moravcsik,et al.  Some Results on the Function and Quality of Citations , 1975 .

[20]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[21]  John Tait,et al.  Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval , 2006 .

[22]  GhemawatSanjay,et al.  The Google file system , 2003 .

[23]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[24]  Giovanni Dietler,et al.  Knotted Fishing Line, Covalent Bonds, and Breaking Points , 1999 .

[25]  James Bailey,et al.  Document clustering of scientific texts using citation contexts , 2010, Information Retrieval.

[26]  Dragomir R. Radev,et al.  Citation Summarization Through Keyphrase Extraction , 2010, COLING.

[27]  Dragomir R. Radev,et al.  Purpose and Polarity of Citation: Towards NLP-based Bibliometrics , 2013, NAACL.

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.