Characteristics in Flight Data ­ Estimation with Logistic Regression and Support Vector Machines General Background Approach to Uncertainty Hypothesis of the Paper Binary Classification Logistic Regression Support Vector Machines (svms) Comparison between Logistic Regression and Support Vector Machi

We analyze data from flight sectors. The questions are whether there are differences between weekend and weekdays and among sectors. We compare expected prediction errors of linear logistic regression and of linear and non linear kernel classifiers. Linear decision boundaries impose an average prediction error of around around 26 % for the weekend data and around 15 % for the sector name data. Non linear boundaries do not improve the predictive accuracy by more than 4 %. Thus, there is some characteristic in the data which is identified by both methods. Airspace is divided into geographical regions, called sectors. For safety reasons, no more than a certain number of aircraft is allowed to enter certain sectors during one hour. Such numbers are called sector capacities. Airlines pose a demand to enter sectors before take­off by submitting a flight plan to a control center. A flight plan is essentially a time stamped list of way­points. When demand is higher than capacity either take­off is delayed or aircraft are rerouted. We speak of initial demand and regulated demand of a sector. Although pilots have to follow their flight plans, there are differences between the number of aircraft planned to enter sectors and the number that really entered them (the real demand). By consequence, safety is not always guaranteed and available capacity is not always optimally used. We call these differences planning differences. They are consequences of uncertain events like weather conditions, delays, en­air reroutings or more. Such events are not taken into account by the current traffic planning. If there are regularities in planning differences, they can be used to improve current traffic planning. We focus on four sectors in the upper Berlin airspace where planning differences are reported to occur. The sectors are roughly equal in size. The average traversal time of a sector is ten minutes. We use regulated demand (number of aircraft planned to enter a sector) and real demand data (number of aircraft that really entered a sector) counted in intervals of 60 minutes for a total of 141 weekdays and 68 weekend days in the period June 2003­April 2004 of the four sectors EDBBUR1­4. We consider the data as a finite number of realizations of random variables 1. More precisely, we define REAL t1 ; t2 S = 'number of aircraft entering sector S between t1 and t2' for the real demand. Similarly, we define REG for …