Semantic Features for Multi-view Semi-supervised and Active Learning of Text Classification

For multi-view learning, existing methods usually exploit originally provided features for classifier training, which ignore the latent correlation between different views. In this paper, semantic features integrating information from multiple views are extracted for pattern representation. Canonical correlation analysis is used to learn the representation of semantic spaces where semantic features are projections of original features on the basis vectors of the spaces. We investigate the feasibility of semantic features on two learning paradigms: semi-supervised learning and active learning. Experiments on text classification with two state-of-the-art multi-view learning algorithms co-training and co-testing indicate that this use of semantic features can lead to a significant improvement of performance.

[1]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[2]  Steven P. Abney,et al.  Bootstrapping , 2002, ACL.

[3]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[4]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[5]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[6]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[7]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[8]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[9]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[10]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[11]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[12]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[13]  Shiliang Sun,et al.  High Reliable Multi-View Semi-Supervised Learning with Extremely Sparse Labeled Data , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.