A novel improved random forest for text classification using feature ranking and optimal number of trees