A graphically based machine learning approach to predict secondary schools performance in Tunisia

Abstract The main purpose of this paper is to identify the key factors that impact schools' academic performance and to explore their relationships through a two-stage analysis based on a sample of Tunisian secondary schools. In the first stage, we use the Directional Distance Function approach (DDF) to deal with undesirable outputs. The DDF is estimated using Data Envelopment Analysis method (DEA). In the second stage we apply machine-learning approaches (regression trees and random forests) to identify and visualize variables that are associated with a high school performance. The data is extracted from the Program for International Student Assessment (PISA) 2012 survey. The first stage analysis shows that almost 22% of Tunisian schools are efficient and that they could improve their students’ educational performance by 15.6% while using the same level of resources. Regression trees findings indicate that the most important factors associated with higher performance are school size, competition, class size, parental pressure and proportion of girls. Only, school location appears with no impact on school efficiency. Random forests algorithm outcomes display that proportion of girls at school and school size have the most powerful impact on the predictive accuracy of our model and hence could more influence school efficiency. The findings disclose also the high non-linearity of the relationships between these key factors and school performance and reveal the importance of modeling their interactions in influencing efficiency scores.

[1]  Robin Aly,et al.  Identifying child abuse through text mining and machine learning , 2017, Expert Syst. Appl..

[2]  Jill Johnes,et al.  Operational Research in education , 2015, Eur. J. Oper. Res..

[3]  Jill Johnes,et al.  Measuring the research performance of Chinese higher education institutions using data envelopment analysis , 2008 .

[4]  Giuseppe Coco,et al.  Cronyism and education performance , 2014 .

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  M. Otani Relationships between Parental Involvement and Adolescents ’ Academic Achievement and Aspiration October 25 , 2017 , 2017 .

[7]  F. Yahia,et al.  Do dropout and environmental factors matter? A directional distance function assessment of tunisian education efficiency , 2018 .

[8]  Kristof De Witte,et al.  Efficiency in education: a review of literature and a way forward , 2017, J. Oper. Res. Soc..

[9]  Tommaso Agasisti,et al.  Measuring the “managerial” efficiency of public schools: a case study in Italy , 2014 .

[10]  C. Lovell,et al.  A survey of frontier production functions and of their relationship to efficiency measurement , 1980 .

[11]  P. W. Wilson,et al.  Estimation and inference in two-stage, semi-parametric models of production processes , 2007 .

[12]  H. Essid,et al.  Small is not that beautiful after all: measuring the scale efficiency of Tunisian high schools using a DEA-bootstrap method , 2013 .

[13]  M. Giles,et al.  School cost functions: A meta-regression analysis , 2008 .

[14]  Roberto Zotti,et al.  A Directional Distance Approach Applied to Higher Education: An Analysis of Teaching‐Related Output Efficiency , 2016 .

[15]  F. Al-abdulmenem Measuring the Efficiency of Public Universities: Using Data Envelopment Analysis (DEA) to Examine Public Universities in Saudi Arabia , 2016 .

[16]  Emmanuel Thanassoulis,et al.  Applications of Data Envelopment Analysis in Education , 2016 .

[17]  H. Essid,et al.  Productivity, efficiency, and technical change of Tunisian schools: a bootstrapped Malmquist approach with quasi-fixed inputs , 2014 .

[18]  Xin Ma,et al.  Growth in Mathematics Achievement: Analysis With Classification and Regression Trees , 2005 .

[19]  Ayoe Hoff,et al.  Second stage DEA: Comparison of approaches for modelling the DEA score , 2007, Eur. J. Oper. Res..

[20]  Nicolás Grau,et al.  Competition among schools and educational quality: Tension between various objectives of educational policy , 2017, International Journal of Educational Development.

[21]  C. M. Poblete,et al.  Profiles of Chilean students according to academic performance in mathematics: An exploratory study using classification trees and random forests , 2015 .

[22]  Joanicjusz Nazarko,et al.  Application of DEA method in efficiency evaluation of public higher education institutions , 2014 .

[23]  N. Galambos,et al.  What Satisfies Students? Mining Student-Opinion Data with Regression and Decision Tree Analysis , 2002 .

[24]  Giovanni De Marinis,et al.  Machine learning methods for wastewater hydraulics , 2017 .

[25]  Andrew L. Johnson,et al.  Stochastic Nonparametric Approach to Efficiency Analysis: A Unified Framework , 2015 .

[26]  H. Luyten,et al.  School Size Effects Revisited: A Qualitative and Quantitative Review of the Research Evidence in Primary and Secondary Education , 2014 .

[27]  Geraint Johnes,et al.  Student and school performance across countries: A machine learning approach , 2018, Eur. J. Oper. Res..

[28]  D. Geary,et al.  Sex differences in academic achievement are not related to political, economic, or social equality , 2015 .

[29]  Chang-Tai Hsieh,et al.  The effects of generalized school choice on achievement and stratification: Evidence from Chile's voucher program , 2006 .

[30]  Anne West,et al.  Does secondary school size make a difference?: A systematic review , 2006 .

[31]  P. Smeyers The Relevance of Irrelevant Research; The Irrelevance of Relevant Research , 2006 .

[32]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[33]  S. Brand-Gruwel,et al.  A review of the relationship between parental involvement indicators and academic achievement , 2018, Educational Research Review.

[34]  Jae Young Chung,et al.  Dropout early warning systems for high school students using machine learning , 2019, Children and Youth Services Review.

[35]  M. Porcu,et al.  School size and students' achievement. Empirical evidences from PISA survey data , 2018, Socio-Economic Planning Sciences.

[36]  L. Hedges,et al.  Do the Disadvantaged Benefit More from Small Classes? Evidence from the Tennessee Class Size Experiment , 2000, American Journal of Education.

[37]  John Ruggiero,et al.  Measuring efficiency in Australian Schools: A preliminary analysis , 2014 .

[38]  M. Kobus,et al.  The threat of competition and public school performance: Evidence from Poland , 2018, Economics of Education Review.

[39]  Tommaso Agasisti,et al.  The efficiency of Italian secondary schools and the potential role of competition: a data envelopment analysis using OECD-PISA2006 data , 2009 .

[40]  Rolf Färe,et al.  Modeling undesirable factors in efficiency evaluation: Comment , 2004, Eur. J. Oper. Res..

[41]  Susan B. Gerber,et al.  The Enduring Effects of Small Classes. , 2001 .

[42]  S. Managi,et al.  Non-Radial Directional Performance Measurement with Undesirable Outputs , 2014 .

[43]  Juan Aparicio,et al.  Measuring efficiency in education: The influence of imprecision and variability in data on DEA estimates , 2019 .

[44]  P. W. Wilson,et al.  Two-stage DEA: caveat emptor , 2011 .

[45]  Peng Yuan,et al.  Measuring the environmental efficiency of the Chinese industrial sector: A directional distance function approach , 2013, Math. Comput. Model..

[46]  M. Asadullah,et al.  Mind the gap: What explains Malaysia’s underperformance in Pisa? , 2019, International Journal of Educational Development.

[47]  Herbert Kimura,et al.  Machine learning models and bankruptcy prediction , 2017, Expert Syst. Appl..

[48]  Beatriz López,et al.  Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction , 2017, Artif. Intell. Medicine.

[49]  Rajiv D. Banker,et al.  Evaluating Contextual Variables Affecting Productivity Using Data Envelopment Analysis , 2008, Oper. Res..

[50]  Victor V. Podinovski,et al.  Combining the assumptions of variable and constant returns to scale in the efficiency evaluation of secondary schools , 2014, Eur. J. Oper. Res..

[51]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[52]  Francesca Ieva,et al.  Bivariate multilevel models for the analysis of mathematics and reading pupils' achievements , 2017 .

[53]  R. Shepherd Theory of cost and production functions , 1970 .

[54]  N. Taktak,et al.  Inefficience des banques dans un pays en mutation : cas de la Tunisie , 2009 .