Contextualization of topics: browsing through the universe of bibliographic information

This paper describes how semantic indexing can help to generate a contextual overview of topics and visually compare clusters of articles. The method was originally developed for an innovative information exploration tool, called Ariadne, which operates on bibliographic databases with tens of millions of records (Koopman et al. in Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. doi:10.1145/2702613.2732781, 2015b). In this paper, the method behind Ariadne is further developed and applied to the research question of the special issue “Same data, different results”—the better understanding of topic (re-)construction by different bibliometric approaches. For the case of the Astro dataset of 111,616 articles in astronomy and astrophysics, a new instantiation of the interactive exploring tool, LittleAriadne, has been created. This paper contributes to the overall challenge to delineate and define topics in two different ways. First, we produce two clustering solutions based on vector representations of articles in a lexical space. These vectors are built on semantic indexing of entities associated with those articles. Second, we discuss how LittleAriadne can be used to browse through the network of topical terms, authors, journals, citations and various cluster solutions of the Astro dataset. More specifically, we treat the assignment of an article to the different clustering solutions as an additional element of its bibliographic record. Keeping the principle of semantic indexing on the level of such an extended list of entities of the bibliographic record, LittleAriadne in turn provides a visualization of the context of a specific clustering solution. It also conveys the similarity of article clusters produced by different algorithms, hence representing a complementary approach to other possible means of comparison.

[1]  Philipp Mayr,et al.  Scientometrics and information retrieval: weak-links revitalized , 2014, Scientometrics.

[2]  W. Glänzel,et al.  Analysing Scientific Networks Through Co-Authorship , 2004 .

[3]  Wolfgang Glänzel,et al.  Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset , 2017, Scientometrics.

[4]  Kevin W. Boyack Thesaurus-based methods for mapping contents of publication sets , 2017, Scientometrics.

[5]  Wolfgang Glänzel,et al.  Same data—different results? Towards a comparative approach to the identification of thematic structures in science , 2017, Scientometrics.

[6]  Michel Zitt,et al.  Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences , 2006, Inf. Process. Manag..

[7]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[8]  Gwenn Englebienne,et al.  Ariadne's Thread: Interactive Navigation in a World of Networked Information , 2015, CHI Extended Abstracts.

[9]  Yves Laberge,et al.  Simulating nature: a philosophical study of computer-simulation uncertainties and their role in climate science and policy advice , 2013 .

[10]  Wolfgang Glänzel,et al.  Same data—different results? Towards a comparative approach to the identification of thematic structures in science : Introduction to the special issue , 2017 .

[11]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[12]  Kevin W. Boyack,et al.  Investigating the effect of global data on topic detection , 2017, Scientometrics.

[13]  Andrea Scharnhorst,et al.  Contextualization of Topics - Browsing through Terms, Authors, Journals and Cluster Allocations , 2015, ISSI.

[14]  Carl Lagoze,et al.  Mapping the cognitive structure of astrophysics by infomap clustering of the citation network and topic affinity analysis , 2017, Scientometrics.

[15]  Ludo Waltman,et al.  Citation-based clustering of publications using CitNetExplorer and VOSviewer , 2017, Scientometrics.

[16]  Rob Koopman,et al.  Mutual information based labelling and comparing clusters , 2017, Scientometrics.

[17]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[18]  Bart De Moor,et al.  Hybrid clustering for validation and improvement of subject-classification schemes , 2009, Inf. Process. Manag..

[19]  Wolfgang Glänzel,et al.  Same data—different results? Towards a comparative approach to the identification of thematic structures in science , 2017, Scientometrics.

[20]  Loet Leydesdorff,et al.  The semantic mapping of words and co-words in contexts , 2010, J. Informetrics.

[21]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[22]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[23]  Andrea Scharnhorst,et al.  Bibliometric Networks , 2012, ArXiv.

[24]  Rob Koopman,et al.  Clustering articles based on semantic similarity , 2017, Scientometrics.

[25]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[26]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[27]  Anne Beaulieu,et al.  Exploring Uncertainty in Knowledge Representations: Classifications, Simulations, and Models of the World , 2012 .

[28]  Henk F. Moed,et al.  Handbook of Quantitative Science and Technology Research , 2005 .

[29]  Philipp Mayr,et al.  Science models for search: a study on combining scholarly information retrieval and scientometrics , 2015, Scientometrics.

[30]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[31]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[32]  S. Freytag Image And Logic A Material Culture Of Microphysics , 2016 .

[33]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[34]  Kun Lu,et al.  Measuring author research relatedness: A comparison of word-based, topic-based, and author cocitation approaches , 2012, J. Assoc. Inf. Sci. Technol..

[35]  Michel Zitt,et al.  Hybrid citation-word representations in science mapping: Portolan charts of research fields? , 2011, J. Assoc. Inf. Sci. Technol..

[36]  Katy Börner,et al.  Plug-and-play macroscopes , 2011, Commun. ACM.

[37]  Loet Leydesdorff,et al.  Turning to ontology in STS? Turning to STS through ‘ontology’ , 2013 .

[38]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[39]  Kevin W. Boyack,et al.  Comparison of topic extraction approaches and their results , 2017, Scientometrics.

[40]  Luka Kronegger,et al.  Dynamic Scientific Co-Authorship Networks , 2012 .