Mining Massive Text Data and Developing Tracking Statistics

This paper outlines a systematic data mining procedure for exploring large free-style text datasets to discover useful features and develop tracking statistics, generally referred to as performance measures or risk indicators. The procedure includes text mining, risk analysis, classification for error measurements and nonparametric multivariate analysis. Two aviation safety report repositories PTRS from the FAA and AAS from the NTSB will be used to illustrate applications of our research to aviation risk management and general decision-support systems. Some specific text analysis methodologies and tracking statistics will be discussed. Approaches to incorporating misclassified data or error measurements into tracking statistics will be discussed as well.