Information Retrieval from Full-Text Arabic Databases: Can Search Engines Designed for English Do the Job?

The amount of electronic information in Arabic and other non-English languages available, especially on the World Wide Web, is increasing. Searches for such information can be undertaken on engines developed with the English language in mind, but will these engines work as effectively in other languages? This article investigates the impact on retrieval of prefixes in Arabic, which are far more common than in English. Typically search engines such as AltaVista designed implicitly for English include right hand (suffix) but not left hand (prefix) truncation. A test collection of 271 Arabic HTML records was created and indexed using the personal version of AltaVista. A series of searches was conducted on this collection, again using AltaVista. The results showed that searches on nouns stripped of prefixes reduced recall, in some cases dramatically, and that total recall of nouns can only be guaranteed by repeating searches that include the various prefixed versions of the nouns. The research questions the assumption that search engines designed with English in mind will work as well with different language structures.