Deploying Approaches for Pattern Refinement in Text Mining

Text mining is the technique that helps users find useful information from a large amount of digital text documents on the Web or databases. Instead of the keyword-based approach which is typically used in this field, the pattern-based model containing frequent sequential patterns is employed to perform the same concept of tasks. However, how to effectively use these discovered patterns is still a big challenge. In this study, we propose two approaches based on the use of pattern deploying strategies. The performance of the pattern deploying algorithms for text mining is investigated on the Reuters dataset RCVI and the results show that the effectiveness is improved by using our proposed pattern refinement approaches.

[1]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[2]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[3]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[4]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[5]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[6]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[7]  Yuefeng Li,et al.  Mining ontology for automatically acquiring Web user information needs , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  S. Raman,et al.  Phrase-based text representation for managing the Web documents , 2003, Proceedings ITCC 2003. International Conference on Information Technology: Coding and Computing.

[9]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[10]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[11]  Yue Xu,et al.  Automatic Pattern-Taxonomy Extraction for Web Mining , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[12]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[13]  Stan Matwin,et al.  Statistical Phrases in Automated Text Categorization , 2000 .

[14]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[15]  Yue Xu,et al.  An Effective Deploying Algorithm for Using Pattern-Taxonomy , 2005, iiWAS.

[16]  Mark Stevenson,et al.  The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.

[17]  Mika Klemettinen,et al.  Applying data mining techniques for descriptive phrase extraction in digital document collections , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[18]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.