Factorized Similarity Learning in Networks

The problem of similarity learning is relevant to many data mining applications, such as recommender systems, classification, and retrieval. This problem is particularly challenging in the context of networks, which contain different aspects such as the topological structure, content, and user supervision. These different aspects need to be combined effectively, in order to create a holistic similarity function. In particular, while most similarity learning methods in networks such as Sim Rank utilize the topological structure, the user supervision and content are rarely considered. In this paper, a Factorized Similarity Learning (FSL) is proposed to integrate the link, node content, and user supervision into an uniform framework. This is learned by using matrix factorization, and the final similarities are approximated by the span of low rank matrices. The proposed framework is further extended to a noise-tolerant version by adopting a hinge-loss alternatively. To facilitate efficient computation on large scale data, a parallel extension is developed. Experiments are conducted on the DBLP and CoRA datasets. The results show that FSL is robust, efficient, and outperforms the state-of-the-art.

[1]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[2]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[3]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[4]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[5]  Heng Ji,et al.  Exploring Context and Content Links in Social Media: A Latent Space Method , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Fei Wang,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Active Learning from Relative Queries , 2022 .

[7]  Wei Liu,et al.  Semi-supervised distance metric learning for Collaborative Image Retrieval , 2008, CVPR.

[8]  W. Cheney,et al.  Proximity maps for convex sets , 1959 .

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[11]  Jun Wang,et al.  Fast Pairwise Query Selection for Large-Scale Active Learning to Rank , 2013, 2013 IEEE 13th International Conference on Data Mining.

[12]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[13]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[14]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[15]  Samuel Kotz,et al.  The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance , 2001 .

[16]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[17]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[18]  N. Smelser,et al.  Annual Review of Sociology , 1975 .

[19]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[20]  Ming Lei,et al.  FIU-Miner: a fast, integrated, and user-friendly system for data mining in distributed environment , 2013, KDD.

[21]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[22]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[23]  Tat-Seng Chua,et al.  An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization , 2009, ICML '09.

[24]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[25]  Rongrong Ji,et al.  Cross-media manifold learning for image retrieval & annotation , 2008, MIR '08.

[26]  Shih-Ping Han,et al.  A successive projection method , 1988, Math. Program..

[27]  Michael R. Lyu,et al.  PageSim: A Novel Link-Based Similarity Measure for the World Wide Web , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[30]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[31]  Yan Liu,et al.  Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems , 2012, ICML.

[32]  Heikki Mannila,et al.  Relational link-based ranking , 2004, VLDB.

[33]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[34]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[35]  Zhen Li,et al.  Learning Locally-Adaptive Decision Functions for Person Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[37]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[38]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[39]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[40]  Charu C. Aggarwal,et al.  Towards systematic design of distance functions for data mining applications , 2003, KDD '03.

[41]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[42]  Deepa Paranjpe,et al.  Semi-supervised clustering with metric learning using relative comparisons , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[43]  Fei Wang,et al.  FeaFiner: biomarker identification from medical data through feature generalization and selection , 2013, KDD.

[44]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..