A Tool for Arabic Documents Indexing and Retrieval From a Web Virtual Library
暂无分享,去创建一个
This paper presents a method for automatic indexing and retrieval of Arabic documents from a virtual library. This latter can be multilingual and encapsulates several documents written in different languages. All the documents are scanned in order to be stored in the library. The indexing method consists in using the document contents as indexes. They are firstly scanned and then submitted to an OCR software which provides document contents textual formats. In a second phase, the textual formats serve as input of a module which automatically translates the textual formats to html format (or XML). The different parts of the document contents become hyperlinks to the appropriate document scanned images. The end-user can then ask for downloading a postscript format of the document. This method was experimented for Latin documents, specifically for scientific reviews. This paper presents the method adaptation for Arabic reviews and other kinds of documents.
[2] L. O'Gorman. Image and document processing techniques for the RightPages electronic library system , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.