THE AUTOMATION OF THE CLASSIFICATION OF ECONOMIC ACTIVITIES FROM FREE TEXT DESCRIPTIONS USING AN ARRAY ARCHITECTURE OF PROBABILISTIC NEURAL NETWORK

The automation of the categorization of economic activities from business descriptions in free text format is a huge challenge for the Brazilian governmental administration in the present day. So far, this task has been carried out by humans, not all of them well trained for this job. When this problem is tackled by humans, the subjectivity on their classification brings another problem: different human classifiers can give different results when working on a set of same business descriptions. This can cause a serious distortion on the information for the planning and taxation of the governmental administrations on the three levels: County, State and Federal. Furthermore, the number of possible categories considered is very large, more than 1000 in the Brazilian scenario. The large number of categories makes the problem even harder to be solved. We have applied an array of Probabilistic Neural Network nodes, each one of them specialized in dealing only with a small number of classes. We carried out a series of experiments with these probabilistic neural networks and the automatic classification were better than 70% in all classes tested. Keywords— Text classification, business activities classification, clustering