Data Mining Solution for Assessing Brazilian Secondary School Quality Based on ENEM and Census Data

This paper presents a data mining solution for assessing the quality of Brazilian private secondary schools based on the official school survey and students tests. Following the CRISP-DM method, after the problem interpretation and modeling, these two data sources yearly collected have been transformed to the school granularity level embedding data and expert´s knowledge and have been integrated in a single data set with the national school code as primary key. Further transformations on the joint data set embedded additional knowledge and made the format compatible with the artificial intelligence techniques applied for knowledge extraction. Logistic regression was applied for producing a propensity score for good schools, decision tree applied for extracting the sequential decision making a human would follow and rules were induced for supporting the explanation of a decision based on the score. The AUC_ROC and Max_KS were used for assessing the propensity score performance and, coverage, confidence and lift were used for assessing the quality of the rules induced by the A Priori algorithm, together with the human knowledge available on the literature. The results showed that this domain-driven data mining approach was successful in modeling the problem and validating educational public policies.

[1]  Francisco de A. T. de Carvalho,et al.  DATA MINING APPLIED TO THE PROCESSES CELERITY OF PERNAMBUCO'SSTATE COURT OF ACCOUNTS , 2008 .

[2]  Germano C. Vasconcelos,et al.  The Power of Sampling and Stacking for the PAKDD-2007 Cross-Selling Problem , 2010, Strategic Advancements in Utilizing Data Mining and Warehousing Technologies.

[3]  Cao Longbing Introduction to Domain Driven Data Mining , 2009 .

[4]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Rodrigo Travitzki,et al.  ENEM: limites e possibilidades do Exame Nacional do Ensino Médio enquanto indicador de qualidade escolar , 2013 .

[7]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[8]  J. Hair Multivariate data analysis , 1972 .

[9]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[10]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[11]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[12]  Vadlamani Ravi,et al.  Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review , 2007, Eur. J. Oper. Res..

[13]  Paulo J. L. Adeodato,et al.  Variable Transformation for Granularity Change in Hierarchical Databases in Actual Data Mining Solutions , 2015, IDEAL.