Pattern Taxonomy Mining for Text Categorization

Most of the text mining methods use term-based mining. All those methods are affected by common problems such as synonymy and polysemy. Mining of patterns have more advantage than other term based methods. Pattern Taxonomy Mining can be used to increase the effectiveness in the discovery of useful patterns. In addition to solving the common problems in term based mining, this paper tries to address the low occurring problems as well. Algorithms to deploy patterns and to evolve inner pattern are used to improve the effectiveness of pattern discovery. RCV1 text collection is used for experiments in this paper. Performance and execution of text categorization have significantly enhanced without any lose in the accuracy rate.

[1]  N. Jaisankar,et al.  Pattern Taxonomy Mining for Text Categorization , 2017 .

[2]  Yuefeng Li,et al.  Mining positive and negative patterns for relevance feature discovery , 2010, KDD.

[3]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[4]  Yue Xu,et al.  Deploying Approaches for Pattern Refinement in Text Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[5]  Yue Xu,et al.  Automatic Pattern-Taxonomy Extraction for Web Mining , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[6]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[7]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[8]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[9]  Bharat Chaudhari,et al.  A Comparative Study of Sequential Pattern Mining Algorithms , 2013 .

[10]  Yuefeng Li,et al.  Effective Pattern Discovery for Text Mining , 2012, IEEE Transactions on Knowledge and Data Engineering.

[11]  Olga Vechtomova Book Review: Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze , 2009, CL.

[12]  Stephen E. Robertson,et al.  The TREC 2002 Filtering Track Report , 2002, TREC.

[13]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[14]  Kjersti Aas,et al.  Text Categorisation: A Survey , 1999 .