Multi-modal Information Integration for Document Retrieval

The paper proposes a novel multi-modal document image retrieval framework by exploiting the information of text and graphics regions. The framework applies multiple kernel learning based hashing formulation for generation of composite document indexes using different modalities. The existing multimedia management methods for imaged text documents have not addressed the requirement of old and degraded documents. In the subsequent contribution, we propose novel multi-modal document indexing framework for retrieval of old and degraded text documents by combining OCR'ed text and image based representation using learning. The evaluation of proposed concepts is demonstrated on sampled magazine cover pages, and documents of Devanagari script.

[1]  Eric K. Ringger,et al.  Evaluating Models of Latent Document Semantics in the Presence of OCR Errors , 2010, EMNLP.

[2]  C. V. Jawahar,et al.  Content level access to digital library of India pages , 2012, ICVGIP '12.

[3]  Changsheng Xu,et al.  Using Webcast Text for Semantic Event Detection in Broadcast Sports Video , 2008, IEEE Transactions on Multimedia.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  C. V. Jawahar,et al.  Tripartite Graph Models for Multi Modal Image Retrieval , 2010 .

[6]  Changsheng Xu,et al.  Live sports event detection based on broadcast video and web-casting text , 2006, MM '06.

[7]  Michael L. Wick,et al.  Context-Sensitive Error Correction: Using Topic Models to Improve OCR , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[8]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[9]  Santanu Chaudhury,et al.  Searching OCR'ed Text: An LDA Based Approach , 2011, 2011 International Conference on Document Analysis and Recognition.

[10]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[11]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[12]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[13]  Alberto Messina,et al.  A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval , 2009, WWW '09.

[14]  Santanu Chaudhury,et al.  Feature Combination in Kernel Space for Distance Based Image Hashing , 2012, IEEE Transactions on Multimedia.

[15]  Kazem Taghva,et al.  Effects of OCR Errors on Ranking and Feedback Using the Vector Space Model , 1996, Inf. Process. Manag..

[16]  Yi Yang,et al.  Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[17]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Venu Govindaraju,et al.  Using topic models for OCR correction , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[19]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[20]  Santanu Chaudhury,et al.  A CRF Based Scheme for Overlapping Multi-colored Text Graphics Separation , 2011, 2011 International Conference on Document Analysis and Recognition.

[21]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..