Hierarchical rough decision theoretic framework for text classification

Hierarchical classification problems have been wide investigated in the past years. The available hierarchical classification methods, which use the top-down level-based scheme, often suffer from the burden of inter-level error transmission. In this paper, an instance-centric hierarchical classification framework based on decision-theoretic rough set model is proposed. The procedure of classification will be divided into two phases. Firstly, a hierarchical rough decision model is constructed to acquire all possible paths as well as reduce error transmission. A general loss function for supervised leaning is also defined by which the cost and benefit of assigning an instance to a specific subcategory can be evaluated. Subsequently, a novel classification routing method special for support vector machine is put forward in order to select an optimal classification path. Comparative experimental results with Chinese text classification benchmark TanCorp illustrate the effectiveness of proposed notions.

[1]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[2]  Michelangelo Ceci,et al.  Classifying web documents in a hierarchy of categories: a comprehensive study , 2007, Journal of Intelligent Information Systems.

[3]  Joseph P. Herbert,et al.  Criteria for choosing a rough set model , 2009, Comput. Math. Appl..

[4]  Yiyu Yao,et al.  A Decision Theoretic Framework for Approximating Concepts , 1992, Int. J. Man Mach. Stud..

[5]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[6]  Min Chen,et al.  Semi-supervised Rough Cost/Benefit Decisions , 2009, Fundam. Informaticae.

[7]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  Min Chen,et al.  Rough Multi-category Decision Theoretic Framework , 2008, RSKT.

[9]  Min Chen,et al.  Rough Cluster Quality Index Based on Decision Theory , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[11]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[12]  Songbo Tan,et al.  An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[13]  Jaideep Srivastava,et al.  Blocking reduction strategies in hierarchical text classification , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Huaxiong Li,et al.  A Multi-View Decision Model Based on Decision-Theoretic Rough Set , 2009, RSKT.

[15]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[16]  Yiyu Yao,et al.  Attribute reduction in decision-theoretic rough set models , 2008, Inf. Sci..

[17]  Padmini Srinivasan,et al.  Hierarchical Text Categorization Using Neural Networks , 2004, Information Retrieval.

[18]  Zhou Xianzhong,et al.  Method to determine α in rough set model based on connection degree , 2012 .

[19]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[20]  Yiyu Yao,et al.  Decision-Theoretic Rough Set Models , 2007, RSKT.

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.