Document Classification and Routing

A document classification and routing system is described which uses a probabilistic approach to determine the “flavor” of a text. The necessary probabilities are determined from the relevant training documents. Development, refinement, and testing of the system’s ability to route 120,000 documents into 50 topics are discussed as well as the mathematical model on which it is based.