The Soundex Phonetic Algorithm Revisited for SMS-based Information Retrieval ⋆

The growing use of information technologies such as mobile devices has had a major social and technological impact such as the growing use of Short Message Services (SMS), a communication system broadly used by cellular phone users. Hence the great importance of analyzing representation and normalization techniques for this kind of texts. In this paper we study the performance of the Soundex phonetic algorithm in the information retrieval task, when the queries are SMS texts. A search engine highly improves the mean average precision in comparison with the Uncodified version of the datasets evaluated when we codify the SMS with the corresponding Soundex code. Additionally, we present different adaptations of the Soundex algorithm for codifying SMS, by evaluating the similarity degree between two codified texts: one originally written in natural language, and the other one originally written in SMS “sub-language”. It is shown that the adaptations to the Soundex algorithm allow to raise the level of similarity between the texts in SMS and their corresponding text in English or Spanish language.