Quantifying Cultural Histories via Person Networks in Wikipedia

At least since Priestley's 1765 Chart of Biography, large numbers of individual person records have been used to illustrate aggregate patterns of cultural history. Wikidata, the structured database sister of Wikipedia, currently contains about 2.7 million explicit person records, across all language versions of the encyclopedia. These individuals, notable according to Wikipedia editing criteria, are connected via millions of hyperlinks between their respective Wikipedia articles. This situation provides us with the chance to go beyond the illustration of an idiosyncratic subset of individuals, as in the case of Priestly. In this work we summarize the overlap of nationalities and occupations, based on their co-occurrence in Wikidata individuals. We construct networks of co-occurring nationalities and occupations, provide insights into their respective community structure, and apply the results to select and color chronologically structured subsets of a large network of individuals, connected by Wikipedia hyperlinks. While the imagined communities of nationality are much more discrete in terms of co-occurrence than occupations, our quantifications reveal the existing overlap of nationality as much less clear-cut than in case of occupational domains. Our work contributes to a growing body of research using biographies of notable persons to analyze cultural processes.

[1]  Josef Froschauer,et al.  Art History on Wikipedia, a Macroscopic Observation , 2013, ArXiv.

[2]  Michael Kaufmann,et al.  A systematic approach to the one-mode projection of bipartite graphs , 2011, Social Network Analysis and Mining.

[3]  C. W. Morris Imagined communities: Reflections on the origin and spread of nationalism , 1995 .

[4]  J. Reichardt,et al.  Partitioning and modularity of graphs with arbitrary degree distribution. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Maximilian Klein,et al.  Gender Gap Through Time and Space: A Journey Through Wikipedia Biographies and the "WIGI" Index , 2015, ArXiv.

[6]  Shahar Ronen,et al.  Pantheon 1.0, a manually verified dataset of globally famous biographies , 2015, Scientific Data.

[7]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[8]  Shahar Ronen,et al.  Pantheon: A Dataset for the Study of Global Cultural Production , 2015, ArXiv.

[9]  Dima Shepelyansky,et al.  Interactions of Cultures and Top People of Wikipedia from Ranking of 24 Language Editions , 2014, PloS one.

[10]  Dirk Helbing,et al.  A network framework of cultural history , 2014, Science.

[11]  Bruno Gonçalves,et al.  Links that speak: The global language network and its association with global fame , 2014, Proceedings of the National Academy of Sciences.

[12]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[13]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.