This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics. Query Word Labeling is on token level language identification of query words in code-mixed queries and back-transliteration of identified Indian language words into their native scripts. We have developed letter based language models for the token level language identification of query words and a structured perceptron model for back-transliteration of Indic words. The second subtask for Mixed-script Ad hoc retrieval for Hindi Song Lyrics is to retrieve a ranked list of songs from a corpus of Hindi song lyrics given an input query in Devanagari or transliterated Roman script. We have used edit distance based query expansion and language modeling followed by relevance based reranking for the retrieval of relevant Hindi Song lyrics for a given query.
[1]
Ying Wang,et al.
A study of the effect of term proximity on query expansion
,
2006,
J. Inf. Sci..
[2]
Michael Collins,et al.
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
,
2002,
EMNLP.
[3]
Parth Gupta,et al.
Query expansion for mixed-script information retrieval
,
2014,
SIGIR.
[4]
Tomas Mikolov,et al.
RNNLM - Recurrent Neural Network Language Modeling Toolkit
,
2011
.
[5]
Andreas Stolcke,et al.
SRILM - an extensible language modeling toolkit
,
2002,
INTERSPEECH.
[6]
Hinrich Schütze,et al.
Introduction to information retrieval
,
2008
.
[7]
Mauro Cettolo,et al.
IRSTLM: an open source toolkit for handling large scale language models
,
2008,
INTERSPEECH.