An novel cluster based feature selection and document classification model on high dimension trec data