Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI research

Creation and exchange of knowledge depends on collaboration. Recent work has suggested that the emergence of collaboration frequently relies on geographic proximity. However, being co-located tends to be associated with other dimensions of proximity, such as social ties or a shared organizational environment. To account for such factors, multiple dimensions of proximity have been proposed, including cognitive, institutional, organizational, social and geographical proximity. Since they strongly interrelate, disentangling these dimensions and their respective impact on collaboration is challenging. To address this issue, we propose various methods for measuring different dimensions of proximity. We then present an approach to compare and rank them with respect to the extent to which they indicate co-publications and co-inventions. We adapt the HypTrails approach, which was originally developed to explain human navigation, to co-author and co-inventor graphs. We evaluate this approach on a subset of the German research community, specifically academic authors and inventors active in research on artificial intelligence (AI). We find that social proximity and cognitive proximity are more important for the emergence of collaboration than geographic proximity.

[1]  Tanel Hirv,et al.  Effects of European Union Funding and International Collaboration on Estonian Scientific Impact , 2019, J. Sci. Res..

[2]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Ron Boschma,et al.  Knowledge networks in the Dutch aviation industry: The proximity paradox , 2012 .

[4]  V. Burris The Academic Caste System: Prestige Hierarchies in PhD Exchange Networks , 2004 .

[5]  A. Hotho,et al.  HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web , 2014, WWW.

[6]  Roberto Navigli,et al.  Knowledge-enhanced document embeddings for text classification , 2019, Knowl. Based Syst..

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[9]  C. Edquist,et al.  Institutions and Organizations in Systems of Innovation , 2013 .

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  M. Polanyi Chapter 7 – The Tacit Dimension , 1997 .

[12]  Pierre-Alexandre Balland,et al.  Proximity and the Evolution of Collaboration Networks: Evidence from Research and Development Projects within the Global Navigation Satellite System (GNSS) Industry , 2012 .

[13]  J. S. Katz,et al.  What is research collaboration , 1997 .

[14]  Johanna Hautala,et al.  Cognitive proximity in international research groups , 2011, J. Knowl. Manag..

[15]  Benjamin F. Jones,et al.  Supporting Online Material Materials and Methods Figs. S1 to S3 References the Increasing Dominance of Teams in Production of Knowledge , 2022 .

[16]  Sun Park,et al.  Automatic generic document summarization based on non-negative matrix factorization , 2009, Inf. Process. Manag..

[17]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[19]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[20]  F. E. Principles of Economics , 1890, Nature.

[21]  Doug Downey,et al.  Construction of the Literature Graph in Semantic Scholar , 2018, NAACL.

[22]  Kristian Kersting,et al.  Was ist eine Professur fuer Kuenstliche Intelligenz? , 2019, ArXiv.

[23]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[24]  Richard J. Fitzgerald,et al.  Scientific collaboration networks , 2018 .

[25]  Barry Bozeman,et al.  The Impact of Research Collaboration on Scientific Productivity , 2005 .

[26]  R. Boschma Proximity and Innovation: A Critical Assessment , 2005 .

[27]  Guido Buenstorf,et al.  The next generation (plus one): an analysis of doctoral students’ academic fecundity based on a novel approach to advisor identification , 2018, Scientometrics.

[28]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[29]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[30]  Riccardo Crescenzi,et al.  Papers in Evolutionary Economic Geography # 13 . 24 Do inventors talk to strangers ? On proximity and collaborative knowledge creation , 2013 .

[31]  Markus Strohmaier,et al.  JANUS: A hypothesis-driven Bayesian approach for understanding edge formation in attributed multigraphs , 2017, Appl. Netw. Sci..

[32]  S. Cunningham,et al.  Formation and output of collaborations: the role of proximity in German nanotechnology , 2019, Journal of Evolutionary Economics.

[33]  Christian Catalini,et al.  Microgeography and the Direction of Inventive Activity , 2017, Manag. Sci..

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[36]  S. Breschi,et al.  Mobility of Skilled Workers and Co-Invention Networks: An Anatomy of Localized Knowledge Flows , 2009 .

[37]  A. Salter,et al.  Academic Engagement and Commercialisation: A Review of the Literature on University-Industry Relations , 2012 .

[38]  Feng Xia,et al.  Understanding the advisor–advisee relationship via scholarly data analysis , 2018, Scientometrics.

[39]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[40]  R. Tijssen,et al.  Research collaboration at a distance: Changing spatial patterns of scientific collaboration within Europe , 2010 .

[41]  Pierre-Alexandre Balland,et al.  Proximity and the Evolution of Collaboration Networks: Evidence from R&D projects within the GNSS industry , 2009 .

[42]  Yu Liu,et al.  A Bibliographic Analysis and Collaboration Patterns of IEEE Transactions on Intelligent Transportation Systems Between 2000 and 2015 , 2016, IEEE Transactions on Intelligent Transportation Systems.

[43]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[44]  Scott Stern,et al.  Clusters and Entrepreneurship , 2010 .

[45]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[46]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[47]  J. S. Katz,et al.  What is research collaboration , 1997 .

[48]  B. Nooteboom Learning and Innovation in Organizations and Economies , 2000 .

[49]  W. Glänzel,et al.  Analysing Scientific Networks Through Co-Authorship , 2004 .

[50]  M. Feldman,et al.  R&D spillovers and the ge-ography of innovation and production , 1996 .

[51]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[52]  P. Shapira,et al.  Organizational and institutional influences on creativity in scientific research , 2009 .

[53]  Andreas Hotho,et al.  On the right track! Analysing and Predicting Navigation Success in Wikipedia , 2019, HT.

[54]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[55]  Gianluca Tarasconi,et al.  CRIOS - Patstat Database: Sources, Contents and Access Rules , 2014 .

[56]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[57]  Peter W. Foltz,et al.  Latent semantic analysis for text-based research , 1996 .

[58]  Andreas Hotho,et al.  Extracting Semantics from Unconstrained Navigation on Wikipedia , 2015, KI - Künstliche Intelligenz.

[59]  Dominik P. Heinisch,et al.  Proximity and learning: evidence from a post-WW2 intellectual reparations program , 2020, Journal of Economic Geography.

[60]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[61]  Philippe Aghion,et al.  Academic Freedom, Private-Sector Focus, and the Process of Innovation , 2005 .

[62]  Koen Frenken,et al.  Characterizing and comparing innovation systems by different ‘modes’ of knowledge production: A proximity approach , 2015 .

[63]  Dominik P. Heinisch,et al.  Same place, same knowledge – same people? The geography of non-patent citations in Dutch polymer patents , 2015 .

[64]  Koen Frenken,et al.  The geographical and institutional proximity of research collaboration , 2007 .

[65]  Andreas Hotho,et al.  FolkTrails: Interpreting Navigation Behavior in a Social Tagging System , 2016, CIKM.

[66]  Henk F. Moed,et al.  Handbook of Quantitative Science and Technology Research , 2005 .

[67]  Steven Klepper,et al.  Why does entry cluster geographically? Evidence from the US tire industry , 2010 .

[68]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .