Extracting Tag Hierarchies

Tagging items with descriptive annotations or keywords is a very natural way to compress and highlight information about the properties of the given entity. Over the years several methods have been proposed for extracting a hierarchy between the tags for systems with a "flat", egalitarian organization of the tags, which is very common when the tags correspond to free words given by numerous independent people. Here we present a complete framework for automated tag hierarchy extraction based on tag occurrence statistics. Along with proposing new algorithms, we are also introducing different quality measures enabling the detailed comparison of competing approaches from different aspects. Furthermore, we set up a synthetic, computer generated benchmark providing a versatile tool for testing, with a couple of tunable parameters capable of generating a wide range of test beds. Beside the computer generated input we also use real data in our studies, including a biological example with a pre-defined hierarchy between the tags. The encouraging similarity between the pre-defined and reconstructed hierarchy, as well as the seemingly meaningful hierarchies obtained for other real systems indicate that tag hierarchy extraction is a very promising direction for further research with a great potential for practical applications. Tags have become very prevalent nowadays in various online platforms ranging from blogs through scientific publications to protein databases. Furthermore, tagging systems dedicated for voluntary tagging of photos, films, books, etc. with free words are also becoming popular. The emerging large collections of tags associated with different objects are often referred to as folksonomies, highlighting their collaborative origin and the “flat” organization of the tags opposed to traditional hierarchical categorization. Adding a tag hierarchy corresponding to a given folksonomy can very effectively help narrowing or broadening the scope of search. Moreover, recommendation systems could also benefit from a tag hierarchy.

[1]  R. Huber,et al.  The formation and maintenance of crayfish hierarchies: behavioral and self-structuring properties , 2000, Behavioral Ecology and Sociobiology.

[2]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[3]  A Díaz-Guilera,et al.  Self-similar community structure in a network of human interactions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  An-Ping Zeng,et al.  Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach , 2004, BMC Bioinformatics.

[5]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[6]  M. Gerstein,et al.  Getting connected: analysis and principles of biological networks. , 2007, Genes & development.

[7]  Guido Caldarelli,et al.  Hypergraph topological quantities for tagged social networks , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Tamás Vicsek,et al.  Ontologies and tag-statistics , 2012, ArXiv.

[9]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[10]  Joaquín Goñi,et al.  On the origins of hierarchy in complex networks , 2013, Proceedings of the National Academy of Sciences.

[11]  Vittorio Loreto,et al.  Collaborative Tagging and Semiotic Dynamics , 2006, ArXiv.

[12]  Joaquín Goñi,et al.  Hierarchy in complex systems: the possible and the actual , 2013 .

[13]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  A. Barrat,et al.  Consensus formation on adaptive networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  T. Vicsek,et al.  Hierarchical group dynamics in pigeon flocks , 2010, Nature.

[17]  Katarzyna Musial,et al.  Recommendation of Multimedia Objects Based on Similarity of Ontologies , 2008, KES.

[18]  Vittorio Loreto,et al.  Semiotic dynamics and collaborative tagging , 2006, Proceedings of the National Academy of Sciences.

[19]  Brenda McCowan,et al.  Ranking Network of a Captive Rhesus Macaque Society: A Sophisticated Corporative Kingdom , 2011, PloS one.

[20]  O Mason,et al.  Graph theory and networks in Biology. , 2006, IET systems biology.

[21]  Jakob Voß,et al.  Tagging, Folksonomy & Co - Renaissance of Manual Indexing? , 2007, ArXiv.

[22]  Guido Caldarelli,et al.  Random hypergraphs and their applications , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Maxi San Miguel,et al.  Generic absorbing transition in coevolution dynamics. , 2007, Physical review letters.

[26]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[27]  Gergely Palla,et al.  Preferential attachment of communities: The same principle, but a higher level , 2006 .

[28]  R. Solé,et al.  Self-organization versus hierarchy in open-source social networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Hector Garcia-Molina,et al.  Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems , 2006 .

[30]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[31]  Vittorio Loreto,et al.  Collective dynamics of social annotation , 2009, Proceedings of the National Academy of Sciences.

[32]  Sergey N. Dorogovtsev,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW (Physics) , 2003 .

[33]  G. Finocchiaro,et al.  Graph-based identification of cancer signaling pathways from published gene expression signatures using PubLiME , 2007, Nucleic acids research.

[34]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[35]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[36]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[37]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[38]  Joaquín Goñi,et al.  Measuring the hierarchy of feedforward networks. , 2010, Chaos.

[39]  Sergei Maslov,et al.  Hierarchy measures in complex networks. , 2003, Physical review letters.

[40]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[41]  Enys Mones,et al.  Hierarchy Measure for Complex Networks , 2012, PloS one.

[42]  Kristina Lerman,et al.  Constructing folksonomies from user-specified relations on flickr , 2009, WWW '09.

[43]  Krzysztof Juszczyszyn,et al.  Ontology-based Recommendation in Multimedia Sharing Systems , 2008 .

[44]  Paul A. Bates,et al.  Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis , 2006, BMC Bioinformatics.

[45]  Kristina Lerman,et al.  A probabilistic approach for learning folksonomies from structured data , 2011, WSDM '11.

[46]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[47]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[48]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[49]  Céline Van Damme,et al.  FolksOntology : An Integrated Approach for Turning Folksonomies into Ontologies , 2007 .

[50]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[51]  Marcel Ausloos,et al.  Contextualising tags in collaborative tagging systems , 2009, HT '09.

[52]  Detlef Schoder,et al.  Imitation and Quality of Tags in Social Bookmarking Systems - Collective Intelligence Leading to Folksonomies , 2010 .

[53]  V. Eguíluz,et al.  Cooperation and the Emergence of Role Differentiation in the Dynamics of Social Networks1 , 2005, American Journal of Sociology.

[54]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[55]  M. Newman,et al.  Nonequilibrium phase transition in the coevolution of networks and opinions. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[56]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[57]  Katarzyna Musial,et al.  Personalized Ontology-Based Recommender Systems for Multimedia Objects , 2010, Agent and Multi-agent Technology for Internet and Enterprise Systems.

[58]  S. Fortunato,et al.  Statistical physics of social dynamics , 2007, 0710.3256.

[59]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[60]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[61]  Robert Meersman,et al.  From Folksologies to Ontologies: How the Twain Meet , 2006, OTM Conferences.