A New Method of Text Categorization and Summarization with Fuzzy Confusion Matrix

Present work is a technique fuzzy text categorization followed by extractive summarization of categorized texts. At the onset, the texts of different subjects are fuzzy categorized based on relative matching with index terms of corresponding subjects. After forming the categorical groups, extractive summarization is performed on each text of each category. The fuzzy categorization is evaluated with fuzzy confusion matrix. The performance evaluation of this fuzzy categorization with Holdout method in terms of accuracy, precision, recall and f-score is appreciably high. The accuracy of summarization is evaluated using human generated summary and is fair. Also the categorization and summarization time is acceptable. Keywords—Fuzzy Text Categorization, Fuzzy Confusion Matrix, Extractive Summarization, Term Frequency, Inter document frequency, Sentence Weight, Clustering, OCA, Holdout Method, Accuracy, Precision, Recall, F-Score

[1]  Václav Matousek,et al.  HMM based handwritten text recognition using biometrical data acquisition pen , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[2]  Cheng-Lin Liu,et al.  Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems , 2005 .

[3]  Ferdinand van der Heijden,et al.  Edge and Line Feature Extraction Based on Covariance Models , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Goutam Sarker An Unsupervised Natural Clustering with Optimal Conceptual Affinity , 2010, J. Intell. Syst..

[5]  G. Barker An Unsupervised Natural Clustering with Optimal Conceptual Affinity , 2010 .

[6]  Rohit Prasad,et al.  Handwritten and Typewritten Text Identification and Recognition Using Hidden Markov Models , 2011, 2011 International Conference on Document Analysis and Recognition.

[7]  S. Himavathi,et al.  Diagonal Based Feature Extraction for Handwritten Alphabets Recognition System using Neural Network , 2011, ArXiv.

[8]  Yafang Xue,et al.  Optical Character Recognition , 2022 .

[9]  Goutam Sarker,et al.  A programming based handwritten text identification , 2015, 2015 International Conference on Advances in Computer Engineering and Applications.

[10]  Goutam Sarker,et al.  An optimal clustering for fuzzy categorization of cursive handwritten text with weight learning in textual attributes , 2015, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS).

[11]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[12]  Goutam Sarker,et al.  A learning based handwritten text categorization , 2015, 2015 International Conference on Advances in Computer Engineering and Applications.

[13]  Goutam Sarker,et al.  An Optimal Backpropagation Network for Face Identification and Localization , 2013 .

[14]  Goutam Sarker,et al.  A Back Propagation Network for face identification and localization , 2011, 2011 International Conference on Recent Trends in Information Systems.

[15]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..