Learning-based word segmentation for reliable text document retrieval and augmentation

Imagine that one may have access to a part of a text document, say a page, and from that would want to identify the document to which it belongs. In such cases, there is a need to perform a content-based document retrieval in a large database.