ExpertSeer: a Keyphrase Based Expert Recommender for Digital Libraries

We describe ExpertSeer, a generic framework for expert recommendation based on the contents of a digital library. Given a query term q, ExpertSeer recommends experts of q by retrieving authors who published relevant papers determined by related keyphrases and the quality of papers. The system is based on a simple yet effective keyphrase extractor and the Bayes' rule for expert recommendation. ExpertSeer is domain independent and can be applied to different disciplines and applications since the system is automated and not tailored to a specific discipline. Digital library providers can employ the system to enrich their services and organizations can discover experts of interest within an organization. To demonstrate the power of ExpertSeer, we apply the framework to build two expert recommender systems. The first, CSSeer, utilizes the CiteSeerX digital library to recommend experts primarily in computer science. The second, ChemSeer, uses publicly available documents from the Royal Society of Chemistry (RSC) to recommend experts in chemistry. Using one thousand computer science terms as benchmark queries, we compared the top-n experts (n=3, 5, 10) returned by CSSeer to two other expert recommenders -- Microsoft Academic Search and ArnetMiner -- and a simulator that imitates the ranking function of Google Scholar. Although CSSeer, Microsoft Academic Search, and ArnetMiner mostly return prestigious researchers who published several papers related to the query term, it was found that different expert recommenders return moderately different recommendations. To further study their performance, we obtained a widely used benchmark dataset as the ground truth for comparison. The results show that our system outperforms Microsoft Academic Search and ArnetMiner in terms of Precision-at-k (P@k) for k=3, 5, 10. We also conducted several case studies to validate the usefulness of our system.

[1]  Carlo Strapparava,et al.  Proceedings of the 5th International Workshop on Semantic Evaluation , 2010 .

[2]  Mukesh K. Mohania,et al.  Advances in Databases: Concepts, Systems and Applications , 2007 .

[3]  Lars Schmidt-Thieme,et al.  Proceedings of the third ACM conference on Recommender systems , 2008, RecSys 2008.

[4]  Edie Rasmussen,et al.  Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers , 2007 .

[5]  Wiley Interscience Journal of the American Society for Information Science and Technology , 2013 .

[6]  Susan T. Dumais,et al.  Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval , 2004, SIGIR 2004.

[7]  Luc De Raedt,et al.  Proceedings of the 12th European Conference on Machine Learning , 2001 .

[8]  Peter F. Patel-Schneider,et al.  Proceedings of the 16th international conference on World Wide Web , 2007, WWW 2007.

[9]  Paula Fritzsche Tools in Artificial Intelligence , 2008 .

[10]  Josef Ruppenhofer,et al.  FrameNet II: Extended theory and practice , 2006 .

[11]  Gabriella Kazai,et al.  Advances in Information Retrieval , 2015, Lecture Notes in Computer Science.

[12]  Lillian N. Cassel,et al.  Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries , 2011, JCDL 2011.

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  Preslav Nakov,et al.  Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (MWE 2009) , 2009, MWE@IJCNLP.

[15]  Frank M. Shipman,et al.  Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries , 2005, JCDL 2013.