Empirical Study on Airline Delay Analysis and Prediction

The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, day-wise, airline-wise, cloud cover, temperature, etc. Moreover, we present rigorous experiments on various machine learning model to predict correctly the delay of a flight, namely, logistic regression with L2 regularization, Gaussian Naive Bayes, K-Nearest Neighbors, Decision Tree classifier and Random forest model. The accuracy of the Random Forest model is 82% with a delay threshold of 15 minutes of flight delay. The analysis is carried out using dataset from 1987 to 2008, the training is conducted with dataset from 2000 to 2007 and validated prediction result using 2008 data. Moreover, we have got recall 99% in the Random Forest model.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[3]  LORIS BELCASTRO,et al.  Using Scalable Data Mining for Predicting Flight Delays , 2016, ACM Trans. Intell. Syst. Technol..

[4]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[5]  Gurvinder Singh,et al.  Performance Analysis of Statistical and Supervised Learning Techniques in Stock Data Mining , 2018, Data.

[6]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10]  Michael Bloem,et al.  Approximating the likelihood of historical airline actions to evaluate airline delay cost functions , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[11]  Hamsa Balakrishnan,et al.  Characterization and prediction of air traffic delays , 2014 .

[12]  Young Jin Kim,et al.  Prediction of weather-induced airline delays based on machine learning algorithms , 2016, 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC).

[13]  Karla Hoffman,et al.  Estimating domestic US airline cost of delay based on European model , 2013 .

[14]  Jeffrey Heer,et al.  The Effects of Interactive Latency on Exploratory Visual Analysis , 2014, IEEE Transactions on Visualization and Computer Graphics.

[15]  Manik Sharma,et al.  Iconography : Stark Assessment of Lifestyle Based Human Disorders Using Data Mining Based Learning Techniques , 2017 .