Non-Parametric Model Drift Detection

Abstract : The IARPA seedling effort explored an automated framework for model maintenance. The effort calculated in an unsupervised fashion the difference between the dataset that was used to train the model and the new dataset on which the model is to be applied (this is done using a new tool called CorEx that automatically estimates structure in high dimensional data through correlation) . The experimentation took place on datasets made up of text documents. The difference between datasets used to estimate potential error (drop in accuracy) that the model would incur if applied on the new dataset. The tradeoff between time cost of retraining the model and potential error of applying the original model on the new dataset will used in making the decision on whether to retrain or not.