论文信息 - Personal Names Popularity Estimation and Its Application to Record Linkage

Personal Names Popularity Estimation and Its Application to Record Linkage

In this study, we investigate several statistical techniques for personal name popularity estimation and perform a record linkage experiment guided by name popularity estimates. The results show that name popularity can leverage personal name matching in databases and be of interest for many other domains.

[1] Marco Baroni,et al. zipfR : word frequency distributions in R , 2007, ACL 2007.

[2] Alexander Panchenko,et al. Detecting Gender by Full Name: Experiments with the Russian Language , 2014, AIST.

[3] Lars Backstrom,et al. ePluribus: Ethnicity on Social Networks , 2010, ICWSM.

[4] G. Lasker,et al. Use of Surname Models in Human Population Biology: A Review of Recent Developments , 2003, Human biology.

[5] Octavian Popescu,et al. Person number estimation in large corpora , 2012, Intelligenza Artificiale.

[6] Felix Naumann,et al. An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.

[7] Claude Castelluccia,et al. How Unique and Traceable Are Usernames? , 2011, PETS.

[8] W. Winkler. USING THE EM ALGORITHM FOR WEIGHT COMPUTATION IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE , 2000 .

[9] Yuan Ding,et al. The City Privacy Attack: Combining Social Media and Public Records for Detailed Profiles of Adults and Children , 2015, COSN.

[10] R. Zweigenhaft,et al. The Psychological Impact of Names , 1980 .

[11] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[12] E. Khmaladze. The statistical analysis of a large number of rare events , 1988 .

[13] David Yarowsky,et al. Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter , 2013, NAACL.

[14] Marco Baroni,et al. Testing the extrapolation quality of word frequency models , 2006 .

[15] David Yarowsky,et al. Typed graph models for semi-supervised learning of name ethnicity , 2011, ACL 2011.

[16] Sune Lehmann,et al. Understanding the Demographics of Twitter Users , 2011, ICWSM.

[17] Arkaitz Zubiaga,et al. Overview of the M-WePNaD Task: Multilingual Web Person Name Disambiguation at IberEval 2017 , 2017, IberEval@SEPLN.

[18] R. Harald Baayen,et al. Word Frequency Distributions , 2001 .

[19] Peter Christen,et al. Data Matching , 2012, Data-Centric Systems and Applications.

[20] P. Longley,et al. Ethnicity and Population Structure in Personal Naming Networks , 2011, PloS one.

[21] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[22] Shou-De Lin,et al. Effective string processing and matching for author disambiguation , 2013, KDD Cup '13.

[23] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.

[24] Peter Christen,et al. A Comparison of Personal Name Matching: Techniques and Practical Issues , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[25] Fan Zhang,et al. What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[26] Chiara Scapoli,et al. Surnames in Western Europe: a comparison of the subcontinental populations through isonymy. , 2007, Theoretical population biology.

[27] Vern Paxson,et al. Trafficking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse , 2013, USENIX Security Symposium.

[28] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[29] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[30] Nadav M. Shnerb,et al. You Name It – How Memory and Delay Govern First Name Dynamics , 2012, PloS one.

[31] Stefan Evert,et al. A Simple LNRE Model for Random Character Sequences , 2004 .

[32] F. L. Wells,et al. A note on singularity in given names. , 1948, The Journal of social psychology.

[33] Ihab F. Ilyas,et al. Trends in Cleaning Relational Data: Consistency and Deduplication , 2015, Found. Trends Databases.