Student and school performance across countries: A machine learning approach

Abstract In this paper, we develop and apply novel machine learning and statistical methods to analyse the determinants of students’ PISA 2015 test scores in nine countries: Australia, Canada, France, Germany, Italy, Japan, Spain, UK and USA. The aim is to find out which student characteristics are associated with test scores and which school characteristics are associated to school value-added (measured at school level). A specific aim of our approach is to explore non-linearities in the associations between covariates and test scores, as well as to model interactions between school-level factors in affecting results. In order to address these issues, we apply a two-stage methodology using flexible tree-based methods. We first run multilevel regression trees in the first stage, to estimate school value-added. In the second stage, we relate the estimated school value-added to school level variables by means of regression trees and boosting. Results show that while several student and school level characteristics are significantly associated to students’ achievements, there are marked differences across countries. The proposed approach allows an improved description of the structurally different educational production functions across countries.

[1]  John V. Kucsera,et al.  E Pluribus...Separation: Deepening Double Segregation for More Students - eScholarship , 2012 .

[2]  Eric A. Hanushek Education production functions , 2020, The Economics of Education.

[3]  Anthony S. Bryk,et al.  Toward a More Appropriate Conceptualization of Research on School Effects: A Three-Level Hierarchical Linear Model , 1988, American Journal of Education.

[4]  Jeffrey S. Simonoff,et al.  RE-EM trees: a data mining approach for longitudinal and clustered data , 2011, Machine Learning.

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Christophe Mues,et al.  An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market , 2016, Eur. J. Oper. Res..

[7]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[8]  E. Hanushek,et al.  The High Cost of Low Educational Performance: The Long-Run Economic Impact of Improving PISA Outcomes. , 2010 .

[9]  Anthony S. Bryk,et al.  A Hierarchical Model for Studying School Effects , 1986 .

[10]  Kaye Stacey,et al.  The International Assessment of Mathematical Literacy: PISA 2012 Framework and Items , 2015 .

[11]  J. Paul Grayson,et al.  Academic Achievement of First-Generation Students in a Canadian University , 1997 .

[12]  J. Angrist,et al.  Using Maimonides&Apos; Rule to Estimate the Effect of Class Size on Student Achievement , 1997 .

[13]  Xin Ma,et al.  Growth in Mathematics Achievement: Analysis With Classification and Regression Trees , 2005 .

[14]  Stephen Raudenbush,et al.  A longitudinal hierarchical linear models for estimating school e?ects and their stability , 1989 .

[15]  N. Galambos,et al.  What Satisfies Students? Mining Student-Opinion Data with Regression and Decision Tree Analysis , 2002 .

[16]  Florence Gabriel,et al.  A machine learning approach to investigating the effects of mathematics dispositions on mathematical literacy , 2017 .

[17]  Francesca Ieva,et al.  Does class matter more than school? Evidence from a multilevel statistical analysis on Italian junior secondary school students , 2016 .

[18]  Francesca Ieva,et al.  Bivariate multilevel models for the analysis of mathematics and reading pupils' achievements , 2017 .

[19]  Russell W. Rumberger,et al.  Dropping Out of Middle School: A Multilevel Analysis of Students and Schools , 1995 .

[20]  Ian Plewis,et al.  Contextual variations in ethnic group differences in educational attainments , 2011 .

[21]  Roberto Savona,et al.  Hedge Fund Systemic Risk Signals , 2010, Eur. J. Oper. Res..

[22]  Anthony S. Bryk,et al.  Toward a More Appropriate Conceptualization of Research on School Effects: A Three-Level Linear Model. , 1988 .

[23]  E. Hanushek,et al.  Aggregation and the Estimated Effects of School Resources , 1996 .

[24]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  S. Raudenbush Educational Applications of Hierarchical Linear Models: A Review , 1988 .

[27]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[28]  Francesca Ieva,et al.  Heterogeneity, school-effects and the North/South achievement gap in Italian secondary education: evidence from a three-level mixed model , 2017, Stat. Methods Appl..

[29]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[30]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[31]  E. Hanushek,et al.  Teachers, Schools, and Academic Achievement , 1998 .