Classification, Analysis, and Prediction of the Daily Operations of Airports Using Machine Learning

The Federal Aviation Administration (FAA) is the regulatory body in the United States responsible for the advancement, safety, and regulation of civil aviation. The FAA also oversees the development of the air traffic control system in the U.S. Over the years, the FAA has made tremendous progress in modernizing the National Airspace System (NAS) by way of technological advancements and the introduction of procedures and policies that have maintained the safety of the United States airspace. However, as with any other system, there is a need to continuously address evolving challenges pertaining to the sustainment and resiliency of the NAS. One of these challenges involves efficiently analyzing and assessing the operations of airports. In particular, there is a need to assess the impact and effectiveness of the implementation of Traffic Management Initiatives (TMI) and other procedures on daily airport operations, as this will lead to the identification of trends and patterns to inform better decision making. The FAA currently manually classifies the daily operations of airports into three categories: “Good Days”, “Average Days”, and “Bad Days” as a means to assess their efficiency. However, this exercise is time-consuming and can be improved. In particular, Big Data Analytics can be leveraged to develop a systematic approach for classifying or clustering the daily operations of airports. This research presents a methodology for clustering the daily operations of Newark International Airport (EWR) using metrics such as the number of diversions, Ground Stops, departure delays, etc. Each of these categories/clusters is then analyzed to identify key characteristics, trends and patterns, which can then be used by airport operators, and FAA analysts and researchers to improve the operations at the airport. Finally, the Boosting Ensemble Machine Learning algorithm is used to predict the category of operations at the airport, hence enabling airport operators, FAA analysts and researchers to take appropriate actions.

[1]  James C. Bezdek,et al.  Visual Assessment of Clustering Tendency for Rectangular Dissimilarity Matrices , 2007, IEEE Transactions on Fuzzy Systems.

[2]  Michael O. Ball,et al.  Ground Delay Programs: Optimizing over the Included Flight Set Based on Distance , 2004 .

[3]  Banavar Sridhar,et al.  Integration of Traffic Flow Management Decisions , 2002 .

[4]  K. K. Sahu,et al.  Normalization: A Preprocessing Stage , 2015, ArXiv.

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  P. Danielsson Euclidean distance mapping , 1980 .

[7]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[8]  Javed A. Aslam Improving Algorithms for Boosting , 2000, COLT.

[9]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[10]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[11]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[12]  Mia Hubert,et al.  Clustering in an object-oriented environment , 1997 .

[13]  Dimitri N. Mavris,et al.  Predicting The Occurrence of Weather And Volume Related Ground Delay Programs , 2019, AIAA Aviation 2019 Forum.

[14]  Sunghae Jun,et al.  An Ensemble Method for Validation of Cluster Analysis , 2011 .

[15]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[16]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[17]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[18]  Volker Gollnick,et al.  Developing Generic Flight Schedules for Airport Clusters , 2015 .

[19]  Dimitri N. Mavris,et al.  Application of Machine Learning to the Analysis and Prediction of the Coincidence of Ground Delay Programs and Ground Stops , 2020 .

[20]  Dimitri N. Mavris,et al.  Prediction and Analysis of Ground Stops with Machine Learning , 2020 .

[21]  Dimitri N. Mavris,et al.  Development of a Data Fusion Framework to support the Analysis of Aviation Big Data , 2019, AIAA Scitech 2019 Forum.

[22]  Yan Zhang,et al.  On the Euclidean distance of images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[24]  Amit Banerjee,et al.  Validating clusters using the Hopkins statistic , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[25]  Dimitri N. Mavris,et al.  Application Of Data Fusion And Machine Learning To The Analysis Of The Relevance Of Recommended Flight Reroutes , 2019 .

[26]  Banavar Sridhar,et al.  Clustering Days with Similar Airport Weather Conditions , 2014 .

[27]  B. Roe,et al.  Boosted decision trees as an alternative to artificial neural networks for particle identification , 2004, physics/0408124.

[28]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[29]  J. Bezdek,et al.  VAT: a tool for visual assessment of (cluster) tendency , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[30]  Brett Lantz Machine learning with R : discover how to build machine learning algorithms, prepare data, and dig deep into data prediction techniques with R , 2015 .

[31]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[32]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[33]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[34]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[35]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[36]  Clark F. Olson,et al.  Parallel Algorithms for Hierarchical Clustering , 1995, Parallel Comput..

[37]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[38]  Guy N. Brock,et al.  clValid , an R package for cluster validation , 2008 .

[39]  James C. Bezdek,et al.  Scalable visual assessment of cluster tendency for large data sets , 2006, Pattern Recognit..

[40]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[41]  Jacalyn M. Huband,et al.  bigVAT: Visual assessment of cluster tendency for large data sets , 2005, Pattern Recognit..