Document classification with supervised latent feature selection
暂无分享,去创建一个
The classification of text documents to categories generally deals with large dimensionality of a structured representation of the documents. To favor generality over accuracy of the classifier some dimensionality reduction technique has to be applied. In the text we present classification algorithm that utilize hidden structures of uncorrelated topics extracted from training documents and their known categories not necessarily independent. The classifier is capable to include various methods of hidden feature selection. Three latent feature selection procedures are proposed and tested.
[1] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[2] Peter W. Foltz,et al. An introduction to latent semantic analysis , 1998 .
[3] David D. Lewis,et al. Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.
[4] Jakob Verbeek,et al. Supervised feature extraction for text categorization , 2000 .