A Normalization Method for Contextual Data: Experience from a Large-Scale Application

This paper describes a pre-processing technique to normalize contextually-dependent data before applying Machine Learning algorithms. Unlike many previous methods, our approach to normalization does not assume that the learning task is a classification task. We propose a data pre-processing algorithm which modifies the relevant attributes so that the effects of the contextual attributes on the relevant attributes are cancelled. These effects are modeled using a novel approach, based on the analysis of variance of the contextual attributes. The method is applied on a massive data repository in the area of aircraft maintenance.