Document Image Retrieval Using Feature Combination in Kernel Space

The paper presents application of multiple features for word based document image indexing and retrieval. A novel framework to perform Multiple Kernel Learning for indexing using the Kernel based Distance Based Hashing is proposed. The Genetic Algorithm based framework is used for optimization. Two different features representing the structural organization of word shape are defined. The optimal combination of both the features for indexing is learned by performing MKL. The retrieval results for document collection belonging to Devanagari script are presented.

[1]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[3]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[4]  Panagiotis Papapetrou,et al.  Nearest Neighbor Retrieval Using Distance-Based Hashing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Santanu Chaudhury,et al.  Shape Descriptor Based Document Image Indexing and Symbol Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[6]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[7]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..