The Background for Data Mining Practice

Analytical methodology was developed in the context of prevailing statistical and analytical theory. The history of statistical theory behind the development of various statistical techniques bears strongly on the ability of the technique to serve the tasks of a data mining project. Analysis proceeds based on the concept of conditional probability: the probability of an event occurring given that another event has already occurred. Bayesian analysis begins with the quantification of the investigator's existing state of knowledge, beliefs, and assumptions. Bayesian approaches to inference testing could lead to widely different conclusions by different medical investigators because they used different sets of subjective priors. Mathematical research continued dominantly along Fisherian statistical lines by developing nonlinear versions of parametric methods. Multiple curvilinear regression was one of the earliest approaches for accounting for nonlinearity in continuous data distributions. Many nonlinear problems involve discrete rather than continuous distributions. This chapter presents a practical approach to building cost-effective data mining models aimed at increasing company profitability, using tutorials, and demo versions of common data mining tools.