Local bilateral clustering for identifying research topics and groups from bibliographical data

The structure of scientific collaboration networks provides insight on the relationships between people and disciplines. In this paper, we study a bipartite graph connecting authors to publications and extract from it clusters of authors and articles, interpreting the author clusters as research groups and the article clusters as research topics. Visualisations are proposed to ease the interpretation of such clusters in terms of discovering leaders, the activity level, and other semantic aspects. We discuss the process of obtaining and preprocessing the information from scientific publications, the formulation and implementation of the clustering algorithm, and the creation of the visualisations. Experiments on a test data set are presented, using an initial prototype implementation of the proposed modules.

[1]  Lars K. Hansen,et al.  Second-Order Assortative Mixing in Social Networks , 2009, 0903.0687.

[2]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[3]  Vladimir Batagelj,et al.  Efficient Algorithms for Citation Network Analysis , 2003, ArXiv.

[4]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Bin Wu,et al.  Visual Analysis of a Co-authorship Network and Its Underlying Structure , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Guangyuan Fu,et al.  A new method to construct co-author networks , 2015 .

[8]  M. Brochhausen,et al.  CollaborationViz: Interactive Visual Exploration of Biomedical Research Collaboration Networks , 2014, PloS one.

[9]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Félix de Moya Anegón,et al.  Detecting, identifying and visualizing research groups in co-authorship networks , 2010, Scientometrics.

[11]  Kei Kurakawa,et al.  Combining Topic Model and Co-author Network for KAKEN and DBLP Linking , 2012, ACIIDS.

[12]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[13]  Sergey N. Dorogovtsev,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW (Physics) , 2003 .

[14]  Satu Elisa Schaeffer,et al.  Assortative and modular networks are shaped by adaptive synchronization processes. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[16]  Johan Bollen,et al.  Co-authorship networks in the digital library research community , 2005, Inf. Process. Manag..

[17]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[18]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[19]  Daniel B. Larremore,et al.  Efficiently inferring community structure in bipartite networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  James Caverlee,et al.  PageRank for ranking authors in co-citation networks , 2009 .

[21]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[22]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[23]  M. Barber Modularity and community detection in bipartite networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Ying Fan,et al.  Weighted networks of scientific communication: the measurement and topological role of weight , 2005 .

[25]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[26]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Guido Caldarelli,et al.  Social network growth with assortative mixing , 2004 .

[28]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[29]  G. Caldarelli,et al.  Assortative model for social networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  J. Moody The Structure of a Social Science Collaboration Network: Disciplinary Cohesion from 1963 to 1999 , 2004 .

[31]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[32]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[33]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[34]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[35]  Yiannis Kompatsiaris,et al.  Community detection in Social Media , 2012, Data Mining and Knowledge Discovery.

[36]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Tsuyoshi Murata,et al.  Community Detection in Large-Scale Bipartite Networks , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[38]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[39]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[40]  M. Newman 1 Who is the best connected scientist ? A study of scientific coauthorship networks , 2004 .

[41]  Hiroshi Ishikawa,et al.  Topic Recommendation from Tag Clouds , 2013 .

[42]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[43]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[44]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[45]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[46]  C. Lee Giles,et al.  Collaboration over time: characterizing and modeling network evolution , 2008, WSDM '08.

[47]  S. N. Dorogovtsev,et al.  Self-organization of collaboration networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Z. Neda,et al.  Measuring preferential attachment in evolving networks , 2001, cond-mat/0104131.

[49]  Tinghuai Ma,et al.  Detect structural‐connected communities based on BSCHEF in C‐DBLP , 2016, Concurr. Comput. Pract. Exp..

[50]  James Caverlee,et al.  PageRank for ranking authors in co-citation networks , 2009, J. Assoc. Inf. Sci. Technol..

[51]  David Sharp,et al.  Ngram and Bayesian Classification of Documents for Topic and Authorship , 2003, Lit. Linguistic Comput..

[52]  Satu Elisa Schaeffer,et al.  Stochastic Local Clustering for Massive Graphs , 2005, PAKDD.

[53]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.