Term selection and weighting approach based on key words in text categorization
暂无分享,去创建一个
Text representation is considered as the mainly problem in text categorization,which is widely used in the vector space model.Term weight in each dimension is its TFIDF value(term frequency,inverse document frequency).But TFIDF is not able to stress the significance of key terms which contribute mainly to the content of a text.A novel term selection and weighting approach based on key words is presented.The structure information and mutual information to extract key words are employed,and word location,word de-pendence,wordfrequency,and document frequency in weighting a term are integrated.In SVM classification experiment,the approach outperforms traditional TFIDF approach with a boost in average precision about 5 %.