Exploiting Attribute-Wise Distribution of Keywords and Category Dependent Attributes for E-Catalog Classification

E-catalogs are semi-structured documents that consist of multiple attributes and values. Although the conventional text classification techniques are applicable to the e-catalog classification as well, they cannot use the attribute information effectively to improve the classification accuracy. In this paper, we propose an e-catalog classification algorithm by extending Naive Bayesian Classifier to use the attribute information. Specifically, we focus on exploiting two e-catalog specific characteristics: the attribute-wise keyword distribution and the category dependent attributes. Experiments on real data validate the proposed method.