Research and application of text classification based on incomplete information system

Document vectors are highly dimensional in text classification,possibly there are tens of thousands of dimension,which leads to a massive amount of calculation.Thus,it is important to decrease the dimension.In the paper,the authors present a quantitative tolerant relation and a heuristic algorithm for attribute reduction,combining theory of incomplete information systems with features of text classification.The experiment results illuminate the efficiency,for it can not only effectively reduce the dimension,but also maintain high accuracy of text classification.