Enhancement of layout-based identification of low-resolution documents using geometrical color distribution

This paper proposes a multi-signature document identification method that works robustly with low-resolution documents captured from handheld devices. The proposed method is based on the extraction of a visual signature containing both (a) the color content distribution in the image plane of the document, i.e. the color signature, and (b) the shallow layout structure of the document, i.e. the layout signature. The color distribution is first considered, in order to filter documents with very dissimilar colors, and the identification is finally done on the remaining set using the layout signature. An evaluation, that compares our color and layout-based method with the layout signature alone, is finally presented.

[1]  Jonathan J. Hull,et al.  Document image database retrieval and browsing using texture analysis , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[2]  Lia Adams,et al.  Palette: a paper interface for giving presentations , 1999, CHI '99.

[3]  Bin Wang,et al.  Color text image binarization based on binary texture analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Brian Christopher Smith,et al.  Passive capture and structuring of lectures , 1999, MULTIMEDIA '99.

[5]  Denis Lalanne,et al.  Visual signature based identification of Low-resolution document images , 2004, DocEng '04.

[6]  Robert M. Haralick,et al.  Document image understanding: geometric and logical layout , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jorge Herbert de Lira,et al.  Two-Dimensional Signal and Image Processing , 1989 .

[8]  Guojun Lu,et al.  Evaluation of similarity measurement for image retrieval , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[9]  Jonathan J. Hull,et al.  A paper-based interface for video browsing and retrieval , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[10]  Andrei Popescu-Belis,et al.  Using Static Documents as Structured and Thematic Interfaces to Multimedia Meeting Archives , 2004, MLMI.

[11]  Pinar Duygulu Sahin,et al.  A hierarchical representation of form documents for identification and retrieval , 2002, International Journal on Document Analysis and Recognition.

[12]  Berna Erol,et al.  Linking presentation documents using image analysis , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.