论文信息 - Novel electronic scissoring algorithm

Novel electronic scissoring algorithm

The electronic media, such as digital magazines and newspapers, are becoming more and more popular nowadays. The computer-aided information retrieval of such electronic documents is crucial for any further data-mining applications. In this paper, we proposed a novel electronic scissoring algorithm to automatically extract the image content from an arbitrary electronic document. Our proposed new algorithm contains two steps, namely the information-content extraction and the image-content detection. In the first step, everything other than the background is extracted using a computationally-efficient variance filter. The information content usually consists of text, image, and delimiters. The next step, the entropy measure is employed to distinguish the images from other content. Experiments are carried out using data captured from random web-pages. The receiver-operating characteristic curves have been presented to demonstrate the effectiveness of our proposed new algorithm.

Supratik Mukhopadhyay | Yiyan Wu | Hsiao-Chun Wu | Limeng Pu | Robert Kooima

[1] Dan S. Bloomberg,et al. Multiresolution Morphological Approach to Document Image Analysis , 1991 .

[2] Henry S. Baird,et al. Truthing for Pixel-Accurate Segmentation , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[3] Thierry Paquet,et al. Automatic article extraction in old newspapers digitized collections , 2014, DATeCH '14.

[4] Rafael C. González,et al. Digital image processing using MATLAB , 2006 .

[5] Henry S. Baird,et al. Segmentation-based retrieval of document images from diverse collections , 2008, Electronic Imaging.

[6] Driss Mammass,et al. A Document Image Segmentation System Using Analysis of Connected Components , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[7] Yalin Wang,et al. Document zone content classification and its performance evaluation , 2006, Pattern Recognit..

[8] Syed Saqib Bukhari,et al. Document image segmentation using discriminative learning over connected components , 2010, DAS '10.

[9] Thomas M. Breuel,et al. Document image zone classification - a simple high-performance approach , 2007, VISAPP.

[10] Santanu Chaudhury,et al. Newspaper Article Extraction Using Hierarchical Fixed Point Model , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.