Automatic Text categorization and summarization using rule reduction

Text mining is a new field that attempts to bring together meaningful information from natural language text. Automatic Text categorization and summarization is the process of assigning pre-defined class labels to incoming, unclassified documents. The class labels are defined based on a set of examples of pre-classified documents used as a training corpus. This research work comprises an automatic text categorization and summarization approach to analyze the structure of input text. In this work a text analyzer is developed to derive the structure of the input text using rule reduction technique in three stages namely, Token Creation, Feature Identification and Categorization and Summarization. This analyzer is tested with sample input texts and gives noteworthy results. Extensive experimentation validates the selection of parameters and the efficacy of our approach for text classification. This work can be expanded and used in many practical applications, including indexing for document retrieval, organizing and maintaining large catalogues of Web resources, automatically extracting metadata, and Word sense disambiguation, etc.

[1]  Charles K. Ayo,et al.  Knowledge Discovery in Online Repositories: A TextMining Approach , 2008 .

[2]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[3]  Naoaki Okazaki,et al.  SemText: A semantically enriched information retrieval system for biology , 2007 .

[4]  Fawzy A. Torkey,et al.  A Text Mining Technique Using Association Rules Extraction , 2008 .

[5]  M. Castellano,et al.  A Web Text Mining Flexible Architecture , 2007 .

[6]  Zakaria Suliman Zubi,et al.  Using text mining techniques in electronic data interchange environment , 2010 .

[7]  ZubiZakaria Suliman Using text mining techniques in electronic data interchange environment , 2010 .

[8]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[9]  Subana Shanmuganathan,et al.  Text Mining of Medical Records for Radiodiagnostic Decision-Making , 2008, J. Comput..

[10]  Atika Mustafa,et al.  Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization , 2009 .

[11]  Periklis Andritsos,et al.  Overview and semantic issues of text mining , 2007, SGMD.

[12]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[13]  Hoàng Kiếm,et al.  Adapting graph mining techniques for text classification , 2012 .

[14]  Song Han,et al.  Powerful Tool to Expand Business Intelligence: Text Mining , 2005 .

[15]  Sharma Chakravarthy,et al.  InfoSift: Adapting Graph Mining Techniques for Text Classification , 2005, FLAIRS.

[16]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[17]  Jaeki Song,et al.  An Empirical Comparison of Four Text Mining Methods* , 2010, J. Comput. Inf. Syst..

[18]  James C. Wetherbe,et al.  An Empirical Comparison of Four Text Mining Methods , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[19]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.