A probabilistic approach for learning folksonomies from structured data

Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. In this work, we present an unsupervised probabilistic approach that extends affinity propagation [7] to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies. This is a challenging task because the method must aggregate similar structures while avoiding structural inconsistencies and handling noise. We validate the approach on a real-world social media dataset, comprised of shallow personal hierarchies specified by many individual users, collected from the photosharing website Flickr. Our empirical results show that our proposed approach is able to construct deeper and denser structures, compared to an approach using only the standard affinity propagation algorithm. Additionally, the approach yields better overall integration quality than a state-of-the-art approach based on incremental relational clustering.

[1]  Pedro M. Domingos,et al.  Hybrid Markov Logic Networks , 2008, AAAI.

[2]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[3]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[4]  Lise Getoor,et al.  Probabilistic Similarity Logic , 2010, UAI.

[5]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[6]  Brendan J. Frey,et al.  FLoSS: Facility location for subspace segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[8]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[9]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[10]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[11]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[12]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[13]  Renée J. Miller,et al.  Leveraging data and structure in ontology integration , 2007, SIGMOD '07.

[14]  Kristina Lerman,et al.  Growing a tree in the forest: constructing folksonomies by integrating structured metadata , 2010, KDD.

[15]  Brendan J. Frey,et al.  A Binary Variable Model for Affinity Propagation , 2009, Neural Computation.

[16]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[17]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[18]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[19]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[20]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[21]  Xinkun Wang,et al.  An effective structure learning method for constructing gene networks , 2006, Bioinform..

[22]  Kristina Lerman,et al.  Constructing folksonomies from user-specified relations on flickr , 2009, WWW '09.

[23]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..