COMPARATIVE ANALYSIS OF C99 AND TOPICTILING TEXT SEGMENTATION ALGORITHMS

In this paper, the work done includes the extractio n of information from image datasets which contain natural text. The difficulty level of segmenting natural text from an image is t oo high and so precision is the most important fact or to be kept in mind. To minimize the error rates, error filtration techniqu e is provided, as filtration is adopted while doing image segmentation basically text segmentation present in images. Furthermore, a comparative analysis of two different text segment ation algorithms namely C99 and TopicTiling on image documents is presented . To assess how well each algorithm works, each was applied on different datasets and results were compared. The work done aproves the efficiency of TopicTiling over C99.

[1]  R. Farhoodi SEGMENTATION FROM IMAGES WITH TEXTURED AND COLORED BACKGROUND , 2006 .

[2]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[3]  Larry Gillick,et al.  Text segmentation and topic tracking on broadcast news via a hidden Markov model approach , 1998, ICSLP.

[4]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[5]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[6]  Chris Biemann,et al.  Text Segmentation with Topic Models , 2012, Journal for Language Technology and Computational Linguistics.

[7]  Majid Mirmehdi,et al.  Recognising text in real scenes , 2002, International Journal on Document Analysis and Recognition.

[8]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[10]  Joemon M. Jose,et al.  Text segmentation via topic modeling: an analytical study , 2009, CIKM.

[11]  E. Dubois,et al.  Digital picture processing , 1985, Proceedings of the IEEE.

[12]  Yaakov Yaari,et al.  Segmentation of Expository Texts by Hierarchical Agglomerative Clustering , 1997, ArXiv.

[13]  Stefano Messelodi,et al.  Automatic identification and skew estimation of text lines in real scene images , 1999, Pattern Recognition.

[14]  Ahmed A. Rafea,et al.  Comparative Analysis of Different Text Segmentation Algorithms on Arabic News Stories , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[15]  Daniel P. Lopresti,et al.  Document Analysis and the World Wide Web , 1996, DAS.

[16]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[17]  Larry S. Davis,et al.  A survey of edge detection techniques , 1975 .

[18]  Robert M. Haralick,et al.  Glossary of computer vision terms , 1990, Pattern Recognit..

[19]  Xie Yuan-dan,et al.  Survey on Image Segmentation , 2002 .

[20]  King-Sun Fu,et al.  A survey on image segmentation , 1981, Pattern Recognit..