Finding email correspondents in online social networks

Email correspondents play an important role in many people’s social networks. Finding email correspondents in social networks accurately, though may seem to be straightforward at a first glance, is challenging. Most of the existing online social networking sites recommend possible matches by comparing the information of email accounts and social network profiles, such as display names and email addresses. However, as shown empirically in this paper, such methods may not be effective in practice. To the best of our knowledge, this problem has not been carefully and thoroughly addressed in research. In this paper, we systematically investigate the problem and develop a practical data mining approach. We find that using only the profiles or the graph structures is far from effective. Our method utilizes the similarity between email accounts and social network user profiles, and at the same time explores the similarity between the email communication network and the social network under investigation. We demonstrate the effectiveness of our method using two real data sets on emails and Facebook.

[1]  H. Bunke Graph Matching : Theoretical Foundations , Algorithms , and Applications , 2022 .

[2]  Guoliang Li,et al.  Fast-join: An efficient method for fuzzy token matching based string similarity join , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[4]  Christopher Joseph Pal CC Prediction with Graphical Models , 2006, CEAS.

[5]  Daniel Marcu,et al.  Cognates Can Improve Statistical Translation Models , 2003, NAACL.

[6]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[7]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[8]  Laura Zager,et al.  Graph similarity and matching , 2005 .

[9]  Horst Bunke,et al.  Feature Selection for Graph-Based Image Classifiers , 2005, IbPRIA.

[10]  Tariq S. Durrani,et al.  A RKHS Interpolator-Based Graph Matching Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Katarzyna Musial,et al.  Social networks on the Internet , 2012, World Wide Web.

[12]  S. Appavu alias Balamurugan,et al.  Classification Methods in the Detection of New Suspicious Emails , 2008, J. Inf. Knowl. Manag..

[13]  L. Lawlor,et al.  Overlap, Similarity, and Competition Coefficients , 1980 .

[14]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[15]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .

[16]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[17]  Ambuj K. Singh,et al.  Deriving phylogenetic trees from the similarity analysis of metabolic pathways , 2003, ISMB.

[18]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[20]  William W. Cohen,et al.  Preventing Information Leaks in Email , 2007, SDM.

[21]  Craig A. Knoblock,et al.  Semantic annotation of unstructured and ungrammatical text , 2005, IJCAI.

[22]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[23]  Yossi Matias,et al.  Suggesting friends using the implicit social graph , 2010, KDD.

[24]  Paul Van Dooren,et al.  Review of Similarity Matrices and Application to Subgraph Matching (abstract) , 2010 .

[25]  Salih O. Duffuaa,et al.  A Linear Programming Approach for the Weighted Graph Matching Problem , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Yiming Yang,et al.  Mining social networks for personalized email prioritization , 2009, KDD.

[27]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[28]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[29]  George C. Verghese,et al.  Graph similarity scoring and matching , 2008, Appl. Math. Lett..

[30]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.