Detecting and identifying ambiguities in regression problems: An approach using a modified mountain method

Regression problems occur in many data analysis applications. The aim of regression is to approximate a function from which measurements were taken. When considering a regression problem, we have to take a number of aspects into account: How noisy the data are, whether they cover the domain sufficiently in which we want to find the regression function and what kind of regression function we should choose. However, the underlying assumption is always that the data actually are (noisy) samples of a single function. In some cases, this might not be true. For instance, when we consider data from a technical process that is controlled by human operators, these operators might use different strategies to reach a particular goal. Even a single operator might not stick to the same strategy all the time. Thus, the dataset containing a mixture of samples from different strategies, do not represent (noisy) samples from a single function. Therefore, there exists an ambiguity of selecting data from a large dataset for regression problems to fit a single model. In this paper, we suggest an approach using a modified mountain method (MMM) to select data from a jumble of large data samples that come from different functions, in order to cope with the ambiguities in the underlying regression problem. The proposed method may also serve to identify the best local (approximation) function(s). These are determined using a weighted regression analysis method. The proposed methodology is explained with a one-dimensional problem, a single input single output system, and later performance of the proposed approach is analysed with artificial data of a two-dimensional case study.

[1]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[4]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[5]  Frank Klawonn,et al.  Detecting Ambiguities in Regression Problems using TSK Models , 2006, Soft Comput..

[6]  Nikhil R. Pal,et al.  Mountain and subtractive clustering method: Improvements and generalizations , 2000, Int. J. Intell. Syst..

[7]  W. T. Tucker,et al.  Convergence theory for fuzzy c-means: Counterexamples and repairs , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Abraham Kandel,et al.  Fuzzy techniques in pattern recognition , 1982 .

[9]  Dimitar Filev,et al.  Generation of Fuzzy Rules by Mountain Clustering , 1994, J. Intell. Fuzzy Syst..

[10]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[11]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[12]  R. Yager,et al.  Approximate Clustering Via the Mountain Method , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[13]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[14]  C. W. Coakley,et al.  A Bounded Influence, High Breakdown, Efficient Regression Estimator , 1993 .

[15]  Ronald R. Yager,et al.  Determining data clusters' fine center, extents, and membership functions using a modified mountain method , 1995, Defense, Security, and Sensing.

[16]  D. G. Simpson,et al.  On One-Step GM Estimates and Stability of Inferences in Linear Regression , 1992 .

[17]  Ronald R. Yager,et al.  MOUNTAIN METHOD-BASED FUZZY CLUSTERING: METHODOLOGICAL CONSIDERATIONS , 1995 .

[18]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.