SimCC-AT: A Method to Compute Similarity of Scientific Papers with Automatic Parameter Tuning

In this paper, we propose SimCC-AT (similarity based on content and citations with automatic parameter tuning) to compute the similarity of scientific papers. As in SimCC, the state-of-the-art method, we exploit a notion of a contribution score in similarity computation. SimCC-AT utilizes an automatic weighting scheme based on SVMrank and thus requires only a smaller number of experiments for parameter tuning than SimCC. Furthermore, our experimental results with a real-world dataset show that the accuracy of SimCC-AT is dramatically higher than that of other existing methods and is comparable to that of SimCC.

[1]  Min-Yen Kan,et al.  Scholarly paper recommendation via user's recent research interests , 2010, JCDL '10.

[2]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[3]  Sunju Park,et al.  C-Rank: A link-based similarity measure for scientific literature databases , 2011, Inf. Sci..

[4]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[5]  Nathalie Aussenac-Gilles,et al.  Combining Link and Content Information for Scientific Topics Discovery , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[6]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[7]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[8]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[9]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[10]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[11]  Seok-Ho Yoon,et al.  On computing text-based similarity in scientific literature , 2011, WWW.

[12]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[13]  Dong-Jin Kim,et al.  SimCC: A novel method to consider both content and citations for computing similarity of scientific papers , 2016, Inf. Sci..