Identifying flight delay patterns using diverse subgroup discovery

Flight delay is a common hassle that affects around one fourth of flights and has been a major concern for airlines for decades. Therefore, an increasing amount of research was done on this topic in recent years. Notably, the fields of machine learning and data mining have proposed various solutions for the prediction of flight delays, typically some hours before departure. However, the most important decisions made by airlines that could benefit from such predictions, i.e., those on scheduled block time and crew schedules, are made between two to six months prior to departure. Consequently, late delay predictions are useless for these scheduling tasks.As accurately predicting delays for individual flights a long time in advance is practically infeasible, we instead propose to search for circumstances associated to large delays. For this we propose to use diverse Subgroup Discovery (SD), a data mining technique that allows to discover subsets of the data that 1) deviate from the overall data with regard to some target variable, and 2) can be described by a simple conjunctive query on the other variables. We apply diverse SD to historic flight data and mine subgroups of flights that, on average, have a large delay. We show that this approach gives subgroups that can be easily understood by experts, despite the fact that non-trivial relations between multiple variables can be discovered. We show that using diverse SD gives less redundant results than standard top-k SD and demonstrate that even in situations where inferring an accurate predictive model is infeasible, local deviations can be effectively captured and described by local patterns, potentially providing valuable insights for, e.g., airline scheduling problems.

[1]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[2]  Arno Siebes,et al.  Data Surveying: Foundations of an Inductive Query Language , 1995, KDD.

[3]  Lance Sherry,et al.  Accuracy of reinforcement learning algorithms for predicting aircraft taxi-out times: A case-study of Tampa Bay departures , 2010 .

[4]  Hamsa Balakrishnan,et al.  Characterization and prediction of air traffic delays , 2014 .

[5]  M. Boley,et al.  Uncovering structure-property relationships of materials by subgroup discovery , 2016, 1612.04307.

[6]  Harold W. Lewis,et al.  A systems approach for scheduling aircraft landings in JFK airport , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[7]  Jing Xiong,et al.  Modelling airline flight cancellation decisions , 2013 .

[8]  Arno J. Knobbe,et al.  Non-redundant Subgroup Discovery in Large and Complex Data , 2011, ECML/PKDD.

[9]  Peter A. Flach,et al.  Subgroup Discovery in Smart Electricity Meter Data , 2014, IEEE Transactions on Industrial Informatics.

[10]  Martin Atzmüller,et al.  Subgroup discovery , 2005, Künstliche Intell..

[11]  Daniel Paurat,et al.  An enhanced relevance criterion for more concise supervised pattern discovery , 2012, KDD.

[12]  Amedeo R. Odoni,et al.  Modelling delay propagation within an airport network , 2013 .

[13]  Kathryn B. Laskey,et al.  Estimation of Delay Propagation in the National Aviation System Using Bayesian Networks , 2005 .

[14]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Jilles Vreeken,et al.  Identifying consistent statements about numerical data with dispersion-corrected subgroup discovery , 2017, Data Mining and Knowledge Discovery.

[16]  Eduardo S. Ogasawara,et al.  A Review on Flight Delay Prediction , 2017, ArXiv.

[17]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[18]  Jinn-Tsai Wong,et al.  A survival model for flight delay propagation , 2012 .

[19]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[20]  Albrecht Zimmermann,et al.  The Chosen Few: On Identifying Valuable Patterns , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  Frank Puppe,et al.  Fast exhaustive subgroup discovery with numerical target concepts , 2016, Data Mining and Knowledge Discovery.

[22]  Eduardo S. Ogasawara,et al.  An analysis of Brazilian flight delays based on frequent patterns , 2016 .

[23]  Diego Klabjan,et al.  Large-Scale Models in the Airline Industry , 2005 .