Recommending Given Names

All over the world, future parents are facing the task of finding a suitable given name for their child. This choice is influenced by different factors, such as the social context, language, cultural background and especially personal taste. Although this task is omnipresent, little research has been conducted on the analysis and application of interrelations among given names from a data mining perspective. The present work tackles the problem of recommending given names, by firstly mining for inter-name relatedness in data from the Social Web. Based on these results, the name search engine "Nameling" was built, which attracted more than 35,000 users within less than six months, underpinning the relevance of the underlying recommendation task. The accruing usage data is then used for evaluating different state-of-the-art recommendation systems, as well our new NameRank algorithm which we adopted from our previous work on folksonomies and which yields the best results, considering the trade-off between prediction accuracy and runtime performance as well as its ability to generate personalized recommendations. We also show, how the gathered inter-name relationships can be used for meaningful result diversification of PageRank-based recommendation systems. As all of the considered usage data is made publicly available, the present work establishes baseline results, encouraging other researchers to implement advanced recommendation systems for given names.

[1]  Gobinda G. Chowdhury,et al.  TREC: Experiment and Evaluation in Information Retrieval , 2007 .

[2]  Mitzlaff Relatedness of Given Names , 2012 .

[3]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[4]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[5]  Ciro Cattuto,et al.  New Insights and Methods For Predicting Face-To-Face Contacts , 2013, ICWSM.

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Leif D. Nelson,et al.  Moniker Maladies: When Names Sabotage Success , 2007 .

[8]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[9]  Trevor Cohen,et al.  Empirical distributional semantics: Methods and biomedical applications , 2009, J. Biomed. Informatics.

[10]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[11]  Michael Lesk,et al.  Word-word associations in document retrieval systems , 1969 .

[12]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[13]  Guy Shani,et al.  Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.

[14]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[15]  Gerd Stumme,et al.  Onomastics 2.0 - The Power of Social Co-Occurrences , 2013, ArXiv.

[16]  Kathleen M. Carley,et al.  Some Simple Algorithms for Structural Comparison , 2005, Comput. Math. Organ. Theory.

[17]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Andreas Hotho,et al.  Testing and evaluating tag recommenders in a live system , 2009, RecSys '09.

[20]  C. Butts Social network analysis: A methodological introduction , 2008 .

[21]  Gerd Stumme,et al.  Namelings - Discover Given Name Relatedness Based on Data from the Social Web , 2012, SocInfo.

[22]  J. Golbeck,et al.  FilmTrust: movie recommendations using trust in web-based social networks , 2006, CCNC 2006. 2006 3rd IEEE Consumer Communications and Networking Conference, 2006..

[23]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[24]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[25]  Yi-Cheng Zhang,et al.  Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.

[26]  Tao Zhou,et al.  Link prediction in weighted networks: The role of weak ties , 2010 .

[27]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[28]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[29]  Gregory Grefenstette,et al.  Finding Semantic Similarity in Raw Text: the Deese Antonyms , 1992 .

[30]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[31]  Andreas Hotho,et al.  On the Semantics of User Interaction in Social Media , 2013, LWA.

[32]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[33]  Alexander J. Smola,et al.  Improving maximum margin matrix factorization , 2008, Machine Learning.

[34]  Andreas Hotho,et al.  Tag recommendations in social bookmarking systems , 2008, AI Commun..

[35]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Diana Inkpen,et al.  Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words , 2006, LREC.

[37]  Lars Schmidt-Thieme,et al.  MyMediaLite: a free recommender system library , 2011, RecSys '11.

[38]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[39]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[40]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[41]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..