Hypertext Classification using Weighted Transductive Support Vector Machines

Hypertext document is a special but important kind of text document for text classification. This paper introduces weighted transductive support vector machines (WTSVMs), which treat test samples discriminately based on their weight factors rather than treat every test sample equally in transductive support vector machines (TSVMs). A hybrid similarity function that includes hyperlink and term components is defined and computed, measuring the similarity between an unlabeled sample and labeled documents. Thus, the adjustment of the decision hyper-plane is refined due to reformulating the penalties on unlabeled samples in the training process. Experimental results on benchmark problems show the efficiency of the proposed method