Mining Risk Factors in RFID Baggage Tracking Data

Airport baggage management is a significant part of the aviation industry. However, for several reasons every year a vast number of bags are mishandled (e.g., Left behind, send to wrong flights, gets lost, etc.,) which costs a lot of money to the aviation industry as well as creates inconvenience and frustration to the passengers. To remedy these problems we propose a detailed methodology for mining risk factors from Radio Frequency Identification (RFID) baggage tracking data. The factors should identify potential issues in the baggage management. However, the baggage tracking data are low level and not directly accessible for finding such factors. Moreover, baggage tracking data are highly imbalanced, for example, our experimental data, which is a large real-world data set from the Scandinavian countries, contains only 0.8% mishandled bags. This imbalance presents difficulties to most data mining techniques. The paper presents detailed steps for pre-processing the unprocessed raw tracking data for higher-level analysis and handling the imbalance problem. We fragment the data set based on a number of relevant factors and find the best classifier for each of them. The paper reports on a comprehensive experimental study with real RFID baggage tracking data and it shows that the proposed methodology results in a strong classifier, and can find interesting concrete patterns and reveal useful insights of the data.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[3]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[6]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[7]  Tanvir Ahmed,et al.  Capturing hotspots for constrained indoor movement , 2013, SIGSPATIAL/GIS.

[8]  Tanvir Ahmed,et al.  A Data Warehouse Solution for Analyzing RFID-Based Baggage Tracking Data , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[9]  Diego Klabjan,et al.  Warehousing and Mining Massive RFID Data Sets , 2006, ADMA.

[10]  Harry Zhang,et al.  Naive Bayesian Classifiers for Ranking , 2004, ECML.

[11]  Tanvir Ahmed,et al.  Finding Dense Locations in Indoor Tracking Data , 2014, 2014 IEEE 15th International Conference on Mobile Data Management.

[12]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[13]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..