Text Search of Surnames in Some Slavic and Other Morphologically Rich Languages Using Rule Based Phonetic Algorithms

Surnames play a key role as person natural identifiers, essentially in present information systems. This paper deals with the topic of optimizing a phonetic search algorithm as a string matching of surnames usable for communications service providers, person registries, social networks or genealogy databases. It describes a proposed solution for the phonetic searching of Slovak and (territorial) neighboring languages (Czech, Polish, Ukrainian, Russian, German, Hungarian, Jewish) surnames. This solution was designed to improve search precision and recall when searching for people by their surnames originating in these languages.

[1]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Coskun Bayrak,et al.  Hybrid Matching Algorithm for Personal Names , 2012, JDIQ.

[3]  Ivan Polasek,et al.  Rule based phonetic search approaches for central Europe , 2010, IEEE 8th International Symposium on Intelligent Systems and Informatics.

[4]  Norbert Fuhr,et al.  Searching Proper Names in Databases , 1995, HIM.

[5]  James W. Hooper,et al.  Language Features for Discrete Simulation , 1987, Comput. Lang..

[6]  James A. Thom,et al.  Relevance Judgments for Assessing Recall , 1996, Inf. Process. Manag..

[7]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[8]  Tomás Kuzár Clustering on Social Web , 2013 .

[9]  David O. Holmes,et al.  Improving precision and recall for Soundex retrieval , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[10]  Ophir Frieder,et al.  On Foreign Name Search , 2010, ECIR.

[11]  Justin Zobel,et al.  Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.

[12]  T. N. Gadd,et al.  PHOENIX: the algorithm , 1990 .

[13]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[14]  T. N. Gadd,et al.  `Fisching fore weds': phonetic retrieval of written text in information systems , 1988 .

[15]  Ingemar J. Cox,et al.  Selecting a Subset of Queries for Acquisition of Further Relevance Judgements , 2011, ICTIR.

[16]  A.A. Popova,et al.  Multilingual Names Database Searching Enhancement , 2008, 2008 IEEE International Symposium on Signal Processing and Information Technology.

[17]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[18]  George A. Vouros,et al.  Summarization system evaluation revisited: N-gram graphs , 2008, TSLP.

[19]  Beatrice T. Oshika,et al.  Computational Techniques For Improved Name Search , 1988, ANLP.