MIRA: Multilingual Information Processing on Relational Architecture

In today's global village, it is critical that the key information tools, such as web search engines, e-Commerce portals and e-Governance, work across multiple natural languages, seamlessly We propose a new flexible architecture – Multilingual Information processing on Relational Architecture (MIRA) – that supports the multilingual processing functionality of the primary storage mechanism for such deployments – the relational database systems, effectively and efficiently We propose new linguistic matching operators that enhances the standard lexicographic matching of database systems into phonetic and semantic domains We further show that the performance of the systems may be made language-neutral Our proposed architecture is based on standards and hence amenable for easy implementation in any type of query processing and information retrieval systems In this paper, we present our approach to implement the above architecture and outline the host of research issues that are opened up due to the inherently fuzzy nature of the alternative matching semantics.

[1]  Jayant R. Haritsa,et al.  LexEQUAL: multilexical matching operator in SQL , 2004, SIGMOD '04.

[2]  Jayant R. Haritsa,et al.  On database support for multilingual environments , 2003, Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation.

[3]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[4]  Jayant R. Haritsa,et al.  LexEQUAL: supporting multilexical queries in SQL , 2004, Proceedings. 20th International Conference on Data Engineering.

[5]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[6]  Jayant R. Haritsa,et al.  LexEQUAL: Supporting Multiscript Matching in Database Systems , 2004, EDBT.

[7]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[8]  Timothy J. Purcell Sorting and searching , 2005, SIGGRAPH Courses.

[9]  M.K.C. MacMahon International Phonetic Association , 2006 .

[10]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[11]  Jayant R. Haritsa,et al.  On the Costs of Multilingualism in Database Systems , 2003, VLDB.

[12]  Luis Gravano,et al.  Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[13]  Justin Zobel,et al.  Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.