A full-text retrieval toolkit for mobile desktop search

Smart handheld devices such as smart phones and personal digital assistants are already in widespread used today. Typically equipped with 200MHz ARM9 or else based CPU, 64 MB of ROM and 32 MB of RAM memory, they are becoming increasingly powerful. Especially based on the GB level Reduced-Size MultiMediaCards (RS-MMC) or Secure Digital (SD) flash card storage expansion, people can install lots of software and save plenty of data such as MP3, eBook, eMail, Wikipedia etc in them. It further introduces the need to do search inside handhelds. However, smart handhelds are always with power constraints and have limited interaction capabilities. In particular, the asymmetric read/write and wear characteristics of flash storage card make it difficult to offer high-performance indexing capabilities. Very few handhelds include the support for basic search functionality currently. To enrich handhelds, we developed a full-text retrieval toolkit named Titan-Lite especially designed for them. People can embed it easily into various handheld applications and implement search functionality. The first edition is written in Symbian C++, and is designed as a research system to run under Symbian OS. Titan-Lite mainly includes four components: storage manager, indexer, analyzer and searcher. Most of them are specially designed considering the characteristics of handhelds. NAND flash is most widely used storage media in handhelds. Reading from NAND flash can be performed at any granularity and is very fast. However, deleting data can only be performed at block granularity (i.e., 8KB∼64KB) and writing data can only be performed at page granularity (i.e., 256B∼512B) after the respective page (and its respective 8KB∼64KB block) has been deleted. What’s more, each page can only be written a limited number of times (typically several hundred thousands). Storage manager is implemented to deal with all of the read/write operations in the system. All writing operations are page based at the offset of multiple page size and the free space in the end of page after writing is right-padded. Indexer uses single-pass inversion index construction method