Author Set Identification via Quasi-Clique Discovery

Author identification based on heterogeneous bibliographic networks, which is to identify potential authors given an anonymous paper, has been studied in recent years. However, most of the existing works merely consider the relationship between authors and anonymous papers, while ignore the relationships between authors. In this paper, we take the relationships among authors into consideration to study the problem of author set identification, which is to identify an author set rather than an individual author related to an anonymous paper. The proposed problem has important applications to new collaborator discovery and group building. We propose a novel Author Set Identification approach, namely ASI. ASI first extracts a task-guided embedding to learn the low-dimensional representations of nodes in bibliographic network. And then ASI leverages the learned embedding to construct a weighted paper-ego-network, which contains anonymous paper and candidate authors. Finally, converting the optimal author set identification to the quasi-clique discovery in the constructed network, ASI utilizes a local-search heuristic mechanism under the guidance of the devised density function to find the optimal quasiclique. Extensive experiments on bibliographic networks demonstrate that ASI outperforms the state-of-art baselines in author set identification.

[1]  Linmei Hu,et al.  Relation Structure-Aware Heterogeneous Information Network Embedding , 2019, AAAI.

[2]  Feng Xia,et al.  ACRec: a co-authorship based random walk model for academic collaboration recommendation , 2014, WWW.

[3]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[4]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5]  Xiang Li,et al.  Meta Structure: Computing Relevance in Large Heterogeneous Information Networks , 2016, KDD.

[6]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[7]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[8]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[9]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[10]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[11]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[12]  Aditya Bhaskara,et al.  Detecting high log-densities: an O(n¼) approximation for densest k-subgraph , 2010, STOC '10.

[13]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[14]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[15]  Hyun Ah Song,et al.  FRAUDAR: Bounding Graph Fraud in the Face of Camouflage , 2016, KDD.

[16]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[17]  Yizhou Sun,et al.  Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation , 2014, CIKM.

[18]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[19]  Jimeng Sun,et al.  Cross-domain collaboration recommendation , 2012, KDD.

[20]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[21]  Hisao Tamaki,et al.  Greedily Finding a Dense Subgraph , 2000, J. Algorithms.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Xiao Wang,et al.  Hyperbolic Heterogeneous Information Network Embedding , 2019, AAAI.

[24]  Sara Cohen,et al.  Efficient Enumeration of Maximal k-Plexes , 2015, SIGMOD Conference.

[25]  Nitesh V. Chawla,et al.  Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification , 2018, WWW.

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  Jiawei Han,et al.  ClusCite: effective citation recommendation by information network-based clustering , 2014, KDD.

[28]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[29]  Alessio Conte,et al.  Fast Enumeration of Large k-Plexes , 2017, KDD.

[30]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[31]  Foster J. Provost,et al.  The myth of the double-blind review?: author identification using only citations , 2003, SKDD.

[32]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[33]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[34]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[35]  Deborah Estrin,et al.  Collaborative Metric Learning , 2017, WWW.

[36]  Philip S. Yu,et al.  Heterogeneous Information Network Embedding for Recommendation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[37]  Jian Pei,et al.  Mining cross-graph quasi-cliques in gene expression and protein interaction data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[38]  Ling Huang,et al.  What You Submit Is Who You Are: A Multimodal Approach for Deanonymizing Scientific Publications , 2015, IEEE Transactions on Information Forensics and Security.

[39]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.