Rough-fuzzy based scene categorization for text detection and recognition in video

Abstract Scene image or video understanding is a challenging task especially when number of video types increases drastically with high variations in background and foreground. This paper proposes a new method for categorizing scene videos into different classes, namely, Animation, Outlet, Sports, e-Learning, Medical, Weather, Defense, Economics, Animal Planet and Technology, for the performance improvement of text detection and recognition, which is an effective approach for scene image or video understanding. For this purpose, at first, we present a new combination of rough and fuzzy concept to study irregular shapes of edge components in input scene videos, which helps to classify edge components into several groups. Next, the proposed method explores gradient direction information of each pixel in each edge component group to extract stroke based features by dividing each group into several intra and inter planes. We further extract correlation and covariance features to encode semantic features located inside planes or between planes. Features of intra and inter planes of groups are then concatenated to get a feature matrix. Finally, the feature matrix is verified with temporal frames and fed to a neural network for categorization. Experimental results show that the proposed method outperforms the existing state-of-the-art methods, at the same time, the performances of text detection and recognition methods are also improved significantly due to categorization.

[1]  Karim Faez,et al.  Localizing scene texts by fuzzy inference systems and low rank matrix recovery model , 2016, Comput. Vis. Image Underst..

[2]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Partha Pratim Roy,et al.  Multi-lingual text recognition from video frames , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[4]  Shih-Fu Chang,et al.  Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Palaiahnakote Shivakumara,et al.  New Fourier-Statistical Features in RGB Space for Video Text Detection , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Dong Tian,et al.  Keypoint trajectory coding on compact descriptor for video analysis , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[7]  Hao Yu,et al.  Neuro-Fuzzy System , 2011, IEEE International Conference on Intelligent Systems.

[8]  Nizar Bouguila,et al.  Automatic Inpainting Scheme for Video Text Detection and Removal , 2013, IEEE Transactions on Image Processing.

[9]  Palaiahnakote Shivakumara,et al.  Multi-Spectral Fusion Based Approach for Arbitrarily Oriented Scene Text Detection in Video Images , 2015, IEEE Transactions on Image Processing.

[10]  Shijian Lu,et al.  Multioriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[13]  Palaiahnakote Shivakumara,et al.  Video scene text frames categorization for text detection and recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[14]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[16]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Heather Dunlop,et al.  Scene classification of images and video via semantic segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[18]  Wen Gao,et al.  Event Tactic Analysis Based on Broadcast Sports Video , 2009, IEEE Trans. Multim..

[19]  Fei Yin,et al.  Discriminative quadratic feature learning for handwritten Chinese character recognition , 2016, Pattern Recognit..

[20]  Wei Shen,et al.  Text detection in scene images based on exhaustive segmentation , 2017, Signal Process. Image Commun..

[21]  Awais Ahmad,et al.  Urban planning and building smart cities based on the Internet of Things using Big Data analytics , 2016, Comput. Networks.

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Christophe Garcia,et al.  Contribution of recurrent connectionist language models in improving LSTM-based Arabic text recognition in videos , 2017, Pattern Recognit..

[24]  Zdzislaw Pawlak,et al.  Rough Set Theory , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[25]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[26]  Tatiana Novikova,et al.  Image Binarization for End-to-End Text Understanding in Natural Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[27]  Palaiahnakote Shivakumara,et al.  A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video , 2015, Expert Syst. Appl..

[28]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[29]  Zhixin Shi,et al.  A Two Level Algorithm for Text Detection in Natural Scene Images , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[30]  Weihong Deng,et al.  Recurrent convolutional neural network for video classification , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[31]  Yuxiao Hu,et al.  Text From Corners: A Novel Approach to Detect Text and Caption in Videos , 2011, IEEE Transactions on Image Processing.

[32]  Chew Lim Tan,et al.  Bayesian classifier for multi-oriented video text recognition system , 2015, Expert Syst. Appl..

[33]  Wei Wang,et al.  Scene text recognition with deeper convolutional neural networks , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[34]  Xu-Cheng Yin,et al.  Robust Text Detection in Natural Scene Images. , 2014, IEEE transactions on pattern analysis and machine intelligence.

[35]  Yunhong Wang,et al.  Random Projected Convolutional Feature for Scene Text Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[36]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[37]  Rupinder Kaur,et al.  Review of Robust Document Image BINARIZATION Technique for Degraded Document Images , 2015 .

[38]  Jorge Stolfi,et al.  SnooperText: A text detection system for automatic indexing of urban scenes , 2014, Comput. Vis. Image Underst..

[39]  Joaquim A. Jorge,et al.  Using fuzzy logic to recognize geometric shapes interactively , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[40]  Witold Pedrycz,et al.  Neighborhood rough sets based multi-label classification for automatic image annotation , 2013, Int. J. Approx. Reason..

[41]  Hong-Yuan Mark Liao,et al.  Automatic Training Image Acquisition and Effective Feature Selection From Community-Contributed Photos for Facial Attribute Detection , 2013, IEEE Transactions on Multimedia.

[42]  Bernd Freisleben,et al.  Long-Term Incremental Web-Supervised Learning of Visual Concepts via Random Savannas , 2012, IEEE Transactions on Multimedia.

[43]  Palaiahnakote Shivakumara,et al.  A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video , 2015, IEEE Transactions on Multimedia.

[44]  Mohan M. Trivedi,et al.  Are all objects equal? Deep spatio-temporal importance prediction in driving videos , 2017, Pattern Recognit..

[45]  Wei Liu,et al.  Computer Vision and Image Understanding Video Classification via Weakly Supervised Sequence Modeling , 2022 .

[46]  Arjun Sharma,et al.  Adapting off-the-shelf CNNs for word spotting & recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).