Unsupervised Many-to-Many Object Matching for Relational Data

We propose a method for unsupervised many-to-many object matching from multiple networks, which is the task of finding correspondences between groups of nodes in different networks. For example, the proposed method can discover shared word groups from multi-lingual document-word networks without cross-language alignment information. We assume that multiple networks share groups, and each group has its own interaction pattern with other groups. Using infinite relational models with this assumption, objects in different networks are clustered into common groups depending on their interaction patterns, discovering a matching. The effectiveness of the proposed method is experimentally demonstrated by using synthetic and real relational data sets, which include applications to cross-domain recommendation without shared user/item identifiers and multi-lingual word clustering.

[1]  Naonori Ueda,et al.  Dynamic Infinite Relational Model for Time-varying Relational Data Analysis , 2010, NIPS.

[2]  Hao Luo,et al.  Cross-Domain Recommendation via Cluster-Level Latent Factor Model , 2013, ECML/PKDD.

[3]  Donald F. Towsley,et al.  Resisting structural re-identification in anonymized social networks , 2010, The VLDB Journal.

[4]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[5]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[6]  Yee Whye Teh,et al.  Bayesian Rose Trees , 2010, UAI.

[7]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[8]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[9]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[10]  Tomoharu Iwata,et al.  Learning Common Grammar from Multilingual Corpus , 2010, ACL.

[11]  Naonori Ueda,et al.  Unsupervised Cluster Matching via Probabilistic Latent Variable Models , 2013, AAAI.

[12]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[13]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[14]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[15]  Masashi Sugiyama,et al.  Cross-Domain Object Matching with Model Selection , 2011, AISTATS.

[16]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[17]  Zoubin Ghahramani,et al.  Bayesian correlated clustering to integrate multiple datasets , 2012, Bioinform..

[18]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[19]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[20]  Andrew McCallum,et al.  Polylingual Topic Models , 2009, EMNLP.

[21]  Jianwen Zhang,et al.  Multitask Bregman clustering , 2010, Neurocomputing.

[22]  Bin Cao,et al.  Multi-Domain Collaborative Filtering , 2010, UAI.

[23]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[24]  Min Zhang,et al.  Feature-Based Method for Document Alignment in Comparable News Corpora , 2009, EACL.

[25]  Qiang Yang,et al.  Transfer Learning in Collaborative Filtering for Sparsity Reduction , 2010, AAAI.

[26]  Joshua B. Tenenbaum,et al.  A probabilistic model of cross-categorization , 2011, Cognition.

[27]  Qiang Yang,et al.  Transfer learning for collaborative filtering via a rating-matrix generative model , 2009, ICML '09.

[28]  David M. Blei,et al.  Multilingual Topic Models for Unaligned Text , 2009, UAI.

[29]  J. Shawe-Taylor,et al.  Multi-View Canonical Correlation Analysis , 2010 .

[30]  Le Song,et al.  Kernelized Sorting , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Fei-Fei Li,et al.  Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Slobodan Vucetic,et al.  Convex Kernelized Sorting , 2012, AAAI.

[33]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[34]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[35]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[36]  M. Cugmas,et al.  On comparing partitions , 2015 .

[37]  Arto Klami Variational Bayesian Matching , 2012, ACML.