Identifying Collaborations among Researchers: a pattern-based approach

In recent years a huge amount of publications and scientific reports has become available through digital libraries and online databases. Digital libraries commonly provide advanced search interfaces, through which researchers can find and explore the most related scientific studies. Even though the publications of a single author can be easily retrieved and explored, understanding how authors have collaborated with each other on specific research topics and to what extent their collaboration have been fruitful is, in general, a challenging task. This paper proposes a new pattern-based approach to analyzing the correlations among the authors of most influential research studies. To this purpose, it analyzes publication data retrieved from digital libraries and online databases by means of an itemset-based data mining algorithm. It automatically extracts patterns representing the most relevant collaborations among authors on specific research topics. Patterns are evaluated and ranked according to the number of citations received by the corresponding publications. The proposed approach was validated in a real case study, i.e., the analysis of scientific literature on genomics. Specifically, we first analyzed scientific studies on genomics acquired from the OMIM database to discover correlations between authors and genes or genetic disorders. Then, the reliability of the discovered patterns was assessed using the PubMed search engine. The results show that, for the majority of the mined patterns, the most influential (top ranked) studies retrieved by performing author-driven PubMed queries range over the same gene/genetic disorder indicated by the top ranked pattern.

[1]  Philip S. Yu,et al.  Efficient mining of weighted association rules (WAR) , 2000, KDD '00.

[2]  Ke Sun,et al.  Mining Weighted Association Rules without Preassigned Weights , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[4]  Min Song,et al.  Exploring the Leading Authors and Journals in Major Topics by Citation Sentences and Topic Modeling , 2016, BIRNDL@JCDL.

[5]  Nikos Mamoulis,et al.  Weighted Coverage based Reviewer Assignment , 2015, SIGMOD Conference.

[6]  Chao Lu,et al.  How does citing behavior for a scientific article change over time? A preliminary study , 2015, ASIST.

[7]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[8]  Fionn Murtagh,et al.  Weighted Association Rule Mining using weighted support and significance framework , 2003, KDD '03.

[9]  Yiwei Thomas Hou,et al.  The new automated IEEE INFOCOM review assignment system , 2016, IEEE Network.

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[11]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[12]  Yannis Manolopoulos,et al.  Generalized Hirsch h-index for disclosing latent facts in citation networks , 2007, Scientometrics.

[13]  Guo Zhang,et al.  Content‐based citation analysis: The next generation of citation analysis , 2014, J. Assoc. Inf. Sci. Technol..

[14]  Namita Dilip Ganjewar Infrequent Weighted Itemset Mining Using Frequent Pattern Growth , 2015 .

[15]  Staša Milojević,et al.  Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content , 2012, J. Assoc. Inf. Sci. Technol..

[16]  Jorge E. Hirsch,et al.  An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship , 2009, Scientometrics.

[17]  Nikos Mamoulis,et al.  A Topic-based Reviewer Assignment System , 2015, Proc. VLDB Endow..

[18]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.