Real-Time Document Image Retrieval for a 10 Million Pages Database with a Memory Efficient and Stability Improved LLAH

This paper presents a real-time document image retrieval method for a large-scale database with Locally Likely Arrangement Hashing (LLAH). In general, when a database is scaled up, a large amount of memory is required and retrieval accuracy drops due to insufficient discrimination power of features. To solve these problems, we propose three improvements: memory reduction by sampling feature points, improvement of discrimination power by increasing the number of feature dimensions and stabilizing features by reducing redundancy. From the experimental results, we have confirmed that the proposed method realizes 50% memory reduction, and achieves 99.4% accuracy and 38ms processing time for a database of 10 million pages.