On Using Disparate Scholarly Data to Identify Potential Members for Interdisciplinary Research Groups

Supporting interdisciplinary research (IDR) requires detecting the expertise needed to solve complex problems and identifying researchers with that expertise. Universities have adopted various expertise systems, many of which use publications and keywords to identify experts. Research expertise is dynamic in nature as one's expertise may change over time. Relying solely on publications to infer research interests can be less effective in identifying potential collaborators as different types of scholarly activities demonstrate the change in research direction at different times. This paper uses disparate scholarly data to propose and evaluate different approaches for building research footprints and presents experimental results to show how these footprints perform in identifying potential members for IDR groups. Results indicate that grant data is a better predictor of IDR membership than publication data. The paper also describes two approaches for building IDR-specific classifier models, along with the accuracy of those models in identifying potential IDR group membership.

[1]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[2]  Holly J. Falk-Krzesinski,et al.  The Team Science Toolkit: enhancing research collaboration through online knowledge sharing. , 2013, American journal of preventive medicine.

[3]  Nancy J. Cooke,et al.  Enhancing the Effectiveness of Team Science , 2015 .

[4]  Yuan-Fang Li,et al.  Capturing Researcher Expertise through MeSH Classification , 2015, K-CAP.

[5]  Ann Q. Gates,et al.  A Feasibility Study of an Approach to Extend Research Footprints , 2016, AAAI Workshop: Scholarly Big Data.

[6]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[7]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[8]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[9]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[10]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[11]  Division on Earth Convergence: Facilitating Transdisciplinary Integration of Life Sciences, Physical Sciences, Engineering, and Beyond , 2014 .

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[16]  Cornelia Caragea,et al.  Extracting Keyphrases from Research Papers Using Citation Networks , 2014, AAAI.

[17]  Chiranjib Bhattacharyya,et al.  Content Driven User Profiling for Comment-Worthy Recommendations of News and Blog Articles , 2015, RecSys.

[18]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[19]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[20]  Kate Ehrlich,et al.  Searching for experts in the enterprise: combining text and social network analysis , 2007, GROUP.

[21]  Kara L. Hall,et al.  The science of team science: overview of the field and introduction to the supplement. , 2008, American journal of preventive medicine.

[22]  Rodrygo L. T. Santos,et al.  On Tag Recommendation for Expertise Profiling: A Case Study in the Scientific Domain , 2015, WSDM.

[23]  Ellen J. Cramer,et al.  VIVO: Enabling National Networking of Scientists , 2010, IASSIST.

[24]  Tereza Iofciu,et al.  Finding Communities of Practice from User Profiles Based on Folksonomies , 2006, EC-TEL Workshops.

[25]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[26]  Jimeng Sun,et al.  Cross-domain collaboration recommendation , 2012, KDD.

[27]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[28]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[29]  Linda F. Samson,et al.  The Science of Team Science , 2013 .

[30]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.