Multi-label Text Categorization with Model Combination based on F1-score Maximization

Text categorization is a fundamental task in natural language processing, and is generally defined as a multi-label categorization problem, where each text document is assigned to one or more categories. We focus on providing good statistical classifiers with a generalization ability for multi-label categorization and present a classifier design method based on model combination and F1-score maximization. In our formulation, we first design multiple models for binary classification per category. Then, we combine these models to maximize the F1-score of a training dataset. Our experimental results confirmed that our proposed method was useful especially for datasets where there were many combinations of category labels.