Advanced Algorithms for Data Mining

Data miners use many analysis techniques from statistics. Data mining includes techniques that are not considered typically in statistics such as radial basis function networks and genetic algorithms. Operations research (OR) uses clustering, graph theory, neural networks, and time series, and also depends on simulation and optimization. Forecasting overlaps data mining, statistics, and OR, and adds a few algorithms like Fourier transforms and wavelets. Building trees interactively has proven popular in applied research, and data exploration is based on experts' knowledge about the domain or area under investigation, and relies on interactive choices. The I-Trees module provides a large number of options to enable users to interactively determine all aspects of the tree-building process. Multivariate Adaptive Regression Splines (MARSplines) constructs a model from a set of coefficients and features or “basis functions” that are determined from the data. MARSplines is well suited for tasks involving categorical predictors variables. Different basis functions are computed for each distinct value for each predictor, and the usual techniques for handling categorical variables are applied. STATISTICA Support Vector Machine (SVM) is a classifier method that performs classification tasks by constructing hyperplanes in a multidimensional space that separates cases of different class labels. Image and object data mining is an area of active current research, involving development of new and modified algorithms that can better deal with the complexities of three-dimensional object identification.