Document Classification with One-class Multiview Learning

Recently, automatic document classification has attracted a lot of attentions due to the large quantity of web  documents. Amongst, a special case is to distinguish whether a document belongs to a target class (Directory) when only the documents of target class are given, which is a standard oneclass classification problem. Moreover, differed from other data, web pages have intrinsic (text) and extrinsic(hyperlink) features. Thus they are very suitable for multiview learning. To tackle the task of one-class document classification, a multiview one-class classifier isproposed, it utilizes the One-cluster clustering based data description (OCCDD) as the base one-class classifier, then gets a one-class classifier in each view by setting a membership threshold, simultaneously, achieves the consensus of different views by a regularization term.Hereafter, different views boost each other, rather than ensemble the results independently orperform document recognition in single view case. We conduct the experiments on the standard WebKBdataset with OCCDD and the proposed multiview method. Experimental results show the good performance of the multiview method in terms of effectiveness and stability to parameter.

[1]  Chen Bin One-Cluster Clustering Based Data Description , 2007 .

[2]  Arlindo L. Oliveira,et al.  An Empirical Comparison of Text Categorization Methods , 2003, SPIRE.

[3]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[4]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[5]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[6]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[7]  John Shawe-Taylor,et al.  Support Vector Machine to Synthesise Kernels , 2004, Deterministic and Statistical Methods in Machine Learning.

[8]  Songcan Chen,et al.  MultiK-MHKS: A Novel Multiple Kernel Learning Algorithm , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Steffen Bickel,et al.  Estimation of Mixture Models Using Co-EM , 2005, ECML.

[10]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[11]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[12]  Michael C. Mozer,et al.  Optimizing Classifier Performance Via the Wilcoxon-Mann-Whitney Statistic , 2003, ICML 2003.

[13]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..