Error-Correcting Output Codes for Multi-Label Text Categorization

When a sample belongs to more than one label from a set of available classes, the classification problem (known as multi-label classification) turns to be more complicated. Text data, widely available nowadays in the world wide web, is an obvious instance example of such a task. This paper presents a new method for multi-label text categorization created by modifying the Error-Correcting Output Coding (ECOC) technique. Using a set of binary complimentary classifiers, ECOC has proven to be efficient for multi-class problems. The proposed method, called ML-ECOC, is a first attempt to extend the ECOC algorithm to handle multi-label tasks. Experimental results on the Reuters benchmarks (RCV1-v2) demonstrate the potential of the proposed method on multi-label text categorization.

[1]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[2]  Camelia Chira,et al.  Ensemble of Binary Learners for Reliable Text Categorization with a Reject Option , 2012, HAIS.

[3]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Ching Y. Suen,et al.  Data-driven decomposition for multi-class classification , 2008, Pattern Recognit..

[6]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[7]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[8]  Adam L. Berger,et al.  ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION , 1999 .

[9]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[14]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[15]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[16]  Claudio Marrocco,et al.  Design of reject rules for ECOC classification systems , 2012, Pattern Recognit..

[17]  Songbo Tan,et al.  Enhancing the Performance of Centroid Classifier by ECOC and Model Refinement , 2009, ECML/PKDD.

[18]  Nima Hatami,et al.  Thinned-ECOC ensemble based on sequential code shrinking , 2012, Expert Syst. Appl..

[19]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[20]  Vasileios Hatzivassiloglou,et al.  Text-based approaches for non-topical image categorization , 2000, International Journal on Digital Libraries.

[21]  Daniel J. Costello,et al.  Error Control Coding, Second Edition , 2004 .

[22]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[23]  James T. Kwok,et al.  MultiLabel Classification on Tree- and DAG-Structured Hierarchies , 2011, ICML.

[24]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[25]  Carl Vogel,et al.  Improving Multiclass Text Classification with Error-Correcting Output Coding and Sub-class Partitions , 2010, Canadian Conference on AI.