Measuring Diversity in Heterogeneous Information Networks

Diversity is a concept relevant to numerous domains of research varying from ecology, to information theory, and to economics, to cite a few. It is a notion that is steadily gaining attention in the information retrieval, network analysis, and artificial neural networks communities. While the use of diversity measures in network-structured data counts a growing number of applications, no clear and comprehensive description is available for the different ways in which diversities can be measured. In this article, we develop a formal framework for the application of a large family of diversity measures to heterogeneous information networks (HINs), a flexible, widely-used network data formalism. This extends the application of diversity measures, from systems of classifications and apportionments, to more complex relations that can be better modeled by networks. In doing so, we not only provide an effective organization of multiple practices from different domains, but also unearth new observables in systems modeled by heterogeneous information networks. We illustrate the pertinence of our approach by developing different applications related to various domains concerned by both diversity and networks. In particular, we illustrate the usefulness of these new proposed observables in the domains of recommender systems and social media studies, among other fields.

[1]  Ni Lao,et al.  Fast query execution for retrieval models based on path-constrained random walks , 2010, KDD.

[2]  Fabien Tarissan,et al.  Investigating the lack of diversity in user behavior: The case of musical content on online platforms , 2020, Inf. Process. Manag..

[3]  S. Hurlbert The Nonconcept of Species Diversity: A Critique and Alternative Parameters. , 1971, Ecology.

[4]  Sergio Gómez,et al.  Ranking in interconnected multilayer networks reveals versatile nodes , 2013, Nature Communications.

[5]  Cecilia Mascolo,et al.  Measuring Urban Social Diversity Using Interconnected Geo-Social Networks , 2016, WWW.

[6]  Borut Zalik,et al.  Memetic algorithm using node entropy and partition entropy for community detection in networks , 2018, Inf. Sci..

[7]  L. Jost Entropy and diversity , 2006 .

[8]  Marcia Levy,et al.  Control in Pyramidal Structures , 2009 .

[9]  John Kay,et al.  Concentration in modern industry : theory, measurement and the U.K. experience , 1978 .

[10]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[11]  Sönke Hoffmann,et al.  Is there a “true” diversity? , 2008 .

[12]  A. Hirschman,et al.  National Power and the Structure of Foreign Trade. , 1946 .

[13]  Lee W. McKnight,et al.  Bridging broadband Internet divides: reconfiguring access to enhance communicative power , 2004, J. Inf. Technol..

[14]  Rasmus K. Ursem,et al.  Diversity-Guided Evolutionary Algorithms , 2002, PPSN.

[15]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[16]  A. Sen,et al.  On Economic Inequality , 1999 .

[17]  Trees, bundles or nets? , 1989, Trends in ecology & evolution.

[18]  M. Gibbons,et al.  Re-Thinking Science: Knowledge and the Public in an Age of Uncertainty , 2003 .

[19]  Jeffrey Xu Yu,et al.  Scalable Diversified Ranking on Large Graphs , 2013, IEEE Trans. Knowl. Data Eng..

[20]  Cassidy R. Sugimoto,et al.  P-Rank: An indicator measuring prestige in heterogeneous scholarly networks , 2011, J. Assoc. Inf. Sci. Technol..

[21]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[22]  Barry Smyth,et al.  Similarity vs. Diversity , 2001, ICCBR.

[23]  Akram Salah,et al.  Exploiting User Demographic Attributes for Solving Cold-Start Problem in Recommender System , 2013 .

[24]  Jack P. Gibbs,et al.  Urbanization, Technology, and the Division of Labor: International Patterns , 1962 .

[25]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[26]  Volker Tresp,et al.  Soft Clustering on Graphs , 2005, NIPS.

[27]  Alexander G. Nikolaev,et al.  On efficient use of entropy centrality for social network analysis and community detection , 2015, Soc. Networks.

[28]  Eli Pariser,et al.  The Filter Bubble: What the Internet Is Hiding from You , 2011 .

[29]  H. Dalton The Measurement of the Inequality of Incomes , 1920 .

[30]  LaoNi,et al.  Relational retrieval using a combination of path-constrained random walks , 2010 .

[31]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[32]  Michel Zitt,et al.  Challenges for scientometric indicators: data demining, knowledge-flow measurements and diversity issues , 2008 .

[33]  Dragomir R. Radev,et al.  Heterogeneous Networks and Their Applications: Scientometrics, Name Disambiguation, and Topic Modeling , 2014, TACL.

[34]  W. Berger,et al.  Diversity of Planktonic Foraminifera in Deep-Sea Sediments , 1970, Science.

[35]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[36]  Anne E. Magurran,et al.  Biological Diversity: Frontiers in Measurement and Assessment , 2011 .

[37]  Philip S. Yu,et al.  PathSim , 2011 .

[38]  François Poulet,et al.  Entropy based community detection in augmented social networks , 2011, 2011 International Conference on Computational Aspects of Social Networks (CASoN).

[39]  Junyu Niu,et al.  A Framework for Recommending Relevant and Diverse Items , 2016, IJCAI.

[40]  Gaston Heimeriks,et al.  Mapping communication and collaboration in heterogeneous research networks , 2003, Scientometrics.

[41]  G. Crooks On Measures of Entropy and Information , 2015 .

[42]  J. Richard Lundgren,et al.  Food Webs, Competition Graphs, Competition-Common Enemy Graphs, and Niche Graphs , 1989 .

[43]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[44]  Jimeng Sun,et al.  Cross-domain collaboration recommendation , 2012, KDD.

[45]  Lada A. Adamic,et al.  Exposure to ideologically diverse news and opinion on Facebook , 2015, Science.

[46]  K. McCann The diversity–stability debate , 2000, Nature.

[47]  R. Paine Food Web Complexity and Species Diversity , 1966, The American Naturalist.

[48]  Anne Boyer,et al.  Modéliser la diversité au cours du temps pour détecter le contexte dans un service de musique en ligne , 2016, Tech. Sci. Informatiques.

[49]  Mitio Nagumo Über eine Klasse der Mittelwerte , 1930 .

[50]  Yiqun Liu,et al.  How good your recommender system is? A survey on evaluations in recommendation , 2017, International Journal of Machine Learning and Cybernetics.

[51]  Imre Csiszár,et al.  Axiomatic Characterizations of Information Measures , 2008, Entropy.

[52]  H. Compston The network of global corporate control: implications for public policy , 2013, Business and Politics.

[53]  Bin Wu,et al.  Entity Set Expansion with Meta Path in Knowledge Graph , 2017, PAKDD.

[54]  C. Radhakrishna Rao Rao's Axiomatization of Diversity Measures , 2006 .

[55]  Nellie Clarke Brown Trees , 1896, Savage Dreams.

[56]  Marshall Hall,et al.  Measures of Concentration , 1967 .

[57]  E. Odum Fundamentals of ecology , 1972 .

[58]  Yizhou Sun,et al.  Recommendation in heterogeneous information networks with implicit user feedback , 2013, RecSys.

[59]  Yizhou Sun,et al.  User guided entity similarity search using meta-path selection in heterogeneous information networks , 2012, CIKM.

[60]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[61]  Christophe Prieur,et al.  Role of the Website Structure in the Diversity of Browsing Behaviors , 2019, HT.

[62]  A. Solow,et al.  Measuring biological diversity , 2006, Environmental and Ecological Statistics.

[63]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[64]  S. McNaughton Diversity and Stability of Ecological Communities: A Comment on the Role of Empiricism in Ecology , 1977, The American Naturalist.

[65]  R. May Food webs. , 1983, Science.

[66]  Frank A. Pasquale The Black Box Society: The Secret Algorithms That Control Money and Information , 2015 .

[67]  Lionel Tabourier,et al.  Testing the Impact of Semantics and Structure on Recommendation Accuracy and Diversity , 2020, 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[68]  E. D. Schneider,et al.  Life as a manifestation of the second law of thermodynamics , 1994 .

[69]  Philip S. Yu,et al.  Top-k Similarity Join in Heterogeneous Information Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[70]  Lior Rokach,et al.  Recommender Systems Handbook , 2010 .

[71]  Filippo Menczer,et al.  Measuring Online Social Bubbles , 2015, 1502.07162.

[72]  R. Courtland Bias detectives: the researchers striving to make algorithms fair , 2018, Nature.

[73]  Yuxin Chen,et al.  HINE: Heterogeneous Information Network Embedding , 2017, DASFAA.

[74]  Michel Zitt,et al.  Facing Diversity of Science: A Challenge for Bibliometric Indicators , 2005 .

[75]  M. Nei,et al.  Estimation of average heterozygosity and genetic distance from a small number of individuals. , 1978, Genetics.

[76]  W. Eichhorn,et al.  An axiomatic characterization of a generalized index of concentration , 1991 .

[77]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[78]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[79]  Christopher B. Murray,et al.  Structural diversity in binary nanoparticle superlattices , 2006, Nature.

[80]  Barry Smyth,et al.  Improving Recommendation Diversity , 2001 .

[81]  Matevz Kunaver,et al.  Diversity in recommender systems - A survey , 2017, Knowl. Based Syst..

[82]  Yuxuan Wang,et al.  An Entropy-Based Weighted Clustering Algorithm and Its Optimization for Ad Hoc Networks , 2007, Third IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2007).

[83]  Nitesh V. Chawla,et al.  SHNE: Representation Learning for Semantic-Associated Heterogeneous Networks , 2019, WSDM.

[84]  D. Rogers,et al.  A Graph Theory Model for Systematic Biology, with an Example for the Oncidiinae (Orchidaceae) , 1966 .

[85]  M. Narasimha Murty,et al.  Fusing Diversity in Recommendations in Heterogeneous Information Networks , 2018, WSDM.

[86]  Michael D. Ekstrand,et al.  Exploring author gender in book rating and recommendation , 2018, User Modeling and User-Adapted Interaction.

[87]  Ji-Rong Wen,et al.  Mining frequent neighborhood patterns in a large labeled graph , 2013, CIKM.

[88]  David Encaoua,et al.  Degree of Monopoly, Indices of Concentration and Threat of Entry , 1980 .

[89]  A. Hirschman National Power and the Structure of Foreign Trade , 2024 .

[90]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[91]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[92]  F. Dyson Statistical Theory of the Energy Levels of Complex Systems. I , 1962 .

[93]  吉田 仁美,et al.  Strength in Diversity , 2019, Bridging Communities through Socially Engaged Art.

[94]  The Degree of Monopoly , 1942 .

[95]  A. Arenas,et al.  Mathematical Formulation of Multilayer Networks , 2013, 1307.4977.

[96]  Bamshad Mobasher,et al.  Meta-Path Selection for Extended Multi-Relational Matrix Factorization , 2016, FLAIRS.

[97]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[98]  S. Homann Concavity and additivity in diversity measurement: re-discovery of an unknown concept , 2007 .

[99]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[100]  Justin M. Rao,et al.  Filter Bubbles, Echo Chambers, and Online News Consumption , 2016 .

[101]  Robert Poulin,et al.  Network analysis shining light on parasite ecology and diversity. , 2010, Trends in parasitology.

[102]  Yizhou Sun,et al.  Personalized entity recommendation: a heterogeneous information network approach , 2014, WSDM.

[103]  Stefano Battiston,et al.  The Network of Global Corporate Control , 2011, PloS one.

[104]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[105]  Young-Rae Cho,et al.  Entropy-Based Graph Clustering: Application to Biological and Social Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[106]  C. Gini Measurement of Inequality of Incomes , 1921 .

[107]  R. Macarthur PATTERNS OF SPECIES DIVERSITY , 1965 .

[108]  Bernard De Baets,et al.  Ecological Diversity: Measuring the Unmeasurable , 2018, Mathematics.

[109]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[110]  A. Baronchelli,et al.  The geographic embedding of online echo chambers: Evidence from the Brexit campaign , 2018, PloS one.

[111]  W. Ashby,et al.  Requisite Variety and Its Implications for the Control of Complex Systems , 1991 .

[112]  Pier Paolo Saviotti,et al.  Variety and the evolution of refinery processing , 2005 .

[113]  Robert M. May,et al.  Ecology: The structure of food webs , 1983, Nature.

[114]  Oc Herfindahl,et al.  Concentration in the US steel industry , 1950 .

[115]  R. Rosenfeld,et al.  Innovation , 2012, A World Connected.

[116]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[117]  David Chavalarias,et al.  Reconstruction of the socio-semantic dynamics of political activist Twitter networks—Method and application to the 2017 French presidential election , 2018, PloS one.

[118]  Marcin Sydow,et al.  The notion of diversity in graphical entity summarisation on semantic knowledge graphs , 2013, Journal of Intelligent Information Systems.

[119]  Georg Groh,et al.  Recommendations in taste related domains: collaborative filtering vs. social filtering , 2007, GROUP.

[120]  Wei Wang,et al.  Collaborative Filtering with Entropy‐Driven User Similarity in Recommender Systems , 2015, Int. J. Intell. Syst..

[121]  T. M. Bezemer,et al.  Ecology: Diversity and stability in plant communities , 2007, Nature.

[122]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[123]  Donald F. Towsley,et al.  Detecting anomalies in network traffic using maximum entropy estimation , 2005, IMC '05.

[124]  Philip S. Yu,et al.  Heterogeneous Information Network Embedding for Recommendation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[125]  Xiaoli Li,et al.  A Heterogeneous Information Network Method for Entity Set Expansion in Knowledge Graph , 2018, PAKDD.

[126]  Sönke Hoffmann Concavity and additivity in diversity measurement : re-discovery of an unknown concept , 2007 .

[127]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[128]  Yi-Cheng Zhang,et al.  Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.

[129]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[130]  Yang Song,et al.  User preference modeling based on meta paths and diversity regularization in heterogeneous information networks , 2019, Knowl. Based Syst..

[131]  Jacques Riget,et al.  A Diversity-Guided Particle Swarm Optimizer - the ARPSO , 2002 .

[132]  Paul H. Williams,et al.  Measuring biodiversity taxonomic relatedness for conservation priorities , 1991 .

[133]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[134]  Ming Jiang,et al.  Axiomatic characterization of nonlinear homomorphic means , 2005 .

[135]  J. Aczél,et al.  Synthesizing judgements: a functional equations approach , 1987 .

[136]  Saul Vargas,et al.  Rank and relevance in novelty and diversity metrics for recommender systems , 2011, RecSys '11.

[137]  Isabel Gómez,et al.  Coping with the problem of subject classification diversity , 2005, Scientometrics.

[138]  Neil J. Hurley,et al.  Novelty and Diversity in Top-N Recommendation -- Analysis and Evaluation , 2011, TOIT.

[139]  Tao Zhou,et al.  Effects of social diversity on the emergence of global consensus in opinion dynamics. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[140]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[141]  Bracha Shapira,et al.  Recommender Systems Handbook , 2015, Springer US.

[142]  Krishna P. Gummadi,et al.  Characterizing Information Diets of Social Media Users , 2015, ICWSM.

[143]  A. Rényi On Measures of Entropy and Information , 1961 .

[144]  J. Aczel,et al.  On Measures of Information and Their Characterizations , 2012 .

[145]  P. Ohadike Urbanization , 1968, Encyclopedia of the UN Sustainable Development Goals.

[146]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[147]  A. Stirling A general framework for analysing diversity in science, technology and society , 2007, Journal of The Royal Society Interface.

[148]  K. Junge Diversity of ideas about diversity measurement , 1994 .

[149]  Kevin M. Clarke,et al.  Estimating Species Richness , 2005 .

[150]  R. May Patterns of species abundance and diversity , 1975 .

[151]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[152]  A. Chao,et al.  Partitioning diversity for conservation analyses , 2010 .

[153]  Peter Knees,et al.  New Paths in Music Recommender Systems Research , 2017, RecSys.

[154]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[155]  M. Hill Diversity and Evenness: A Unifying Notation and Its Consequences , 1973 .

[156]  Philip S. Yu,et al.  Heterogeneous Information Network Analysis and Applications , 2017, Data Analytics.

[157]  S. F. Begum,et al.  Meta Path Based Top-K Similarity Join In Heterogeneous Information Networks , 2016 .

[158]  Hong Gao,et al.  A flexible aggregation framework on large-scale heterogeneous information networks , 2017, J. Inf. Sci..

[159]  A. Stirling On the economics and analysis of diversity , 1998 .

[160]  Luigi Orsenigo,et al.  Innovation, Diversity and Diffusion: A Self-organisation Model , 1988 .

[161]  Timothy H. Keitt,et al.  LANDSCAPE CONNECTIVITY: A GRAPH‐THEORETIC PERSPECTIVE , 2001 .

[162]  Deirdre K. Mulligan,et al.  Discrimination in Online Personalization: A Multidisciplinary Inquiry , 2018, FAT.

[163]  Ana-Andreea Stoica,et al.  Algorithmic Glass Ceiling in Social Networks: The effects of social recommendations on network diversity , 2018, WWW.

[164]  S. Rhoades The Herfindahl-Hirschman index , 1993 .

[165]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[166]  Camille Roth,et al.  Socio-Semantic Frameworks , 2013, Adv. Complex Syst..

[167]  H. Pursiainen,et al.  Consistency in Aggregation, Quasilinear Means and Index Numbers , 2008 .

[168]  Mamadou Diaby,et al.  A Social Formalism and Survey for Recommender Systems , 2015, SKDD.

[169]  Hwan-Jin Kim Concentrated Ownership and Corporate Control: Wallenberg Sphere and Samsung Group , 2014 .

[170]  D. Stark,et al.  Organizing Diversity: Evolutionary Theory, Network Analysis and Postsocialism , 1997 .