Selection of Variables and Factor Derivation

This chapter deals with one of the aspects of commercial data analysis that may be more difficult for newcomers: selecting and deriving key factors from a large number of variables. On the one hand, the novice analyst may think that this is a totally intuitive process performed by business experts. On the other hand, the novice may consider the process to be completely statistical. The reality is found somewhere between the two approaches. From a practical point of view, two different starting points can be considered: (i) What data do I have and what can I do with it? and (ii) I know the final goal or result that I am interested in and I am prepared to obtain the necessary data to achieve it. The first two sections of the chapter consider the approaches from these two starting points and present some of the basic statistical techniques for selecting variables and deriving factors. The third section discusses how to use data mining techniques for selecting the most relevant variables. The final section considers the alternative of obtaining a packaged or proprietary solution of preselected variables and factors for a specific business area. In practice, one or more of these approaches can be employed in order to guarantee that the best possible data and variable selection for the business objective is obtained.