Novel electronic scissoring algorithm

The electronic media, such as digital magazines and newspapers, are becoming more and more popular nowadays. The computer-aided information retrieval of such electronic documents is crucial for any further data-mining applications. In this paper, we proposed a novel electronic scissoring algorithm to automatically extract the image content from an arbitrary electronic document. Our proposed new algorithm contains two steps, namely the information-content extraction and the image-content detection. In the first step, everything other than the background is extracted using a computationally-efficient variance filter. The information content usually consists of text, image, and delimiters. The next step, the entropy measure is employed to distinguish the images from other content. Experiments are carried out using data captured from random web-pages. The receiver-operating characteristic curves have been presented to demonstrate the effectiveness of our proposed new algorithm.