Feature Selection and Dimension Reduction Techniques in SAS

In the field of predictive modeling, variable selection methods can significantly drive the final outcome. While the focus of the analysis may generally be to get the most accurate predictions, it is incomplete without key driver analysis. These drivers could be demographics, geography, credit worthiness, payments history, usage, pricing, and potentially a host of many other key characteristics. Due to a large number of dimensions, many features of these broad categories are bound to remain untested. A million dollar question is how to get to a subset of effects that must definitely be tested. In this paper, we highlight what we have found to be the most effective ways of feature selection along with illustrative applications and best practices on implementation in SAS®. These methods range from simple correlation procedure (PROC CORR) to more complex techniques involving variable clustering (PROC VARCLUS), decision tree importance list (PROC SPLIT) and EXL‟s proprietary process of random feature selection from models developed on bootstrapped samples. By applying these techniques, we have been able to deliver robust and high quality statistical models with the right mix of dimensions.