A greedy algorithm for dimensionality reduction in polynomial regression to forecast the performance of a power plant condenser

Engineers and physicists agree in stating that the heat transfer rate and the cleanliness factor are good indicators of the thermal performance of a power plant condenser. Both, the heat transfer rate and the cleanliness factor depend on other physical variables such as mass flow rates, pressures and temperatures measured by several sensors in the power plant. There is no knowledge about the relationship beyond the intuition of the expert that it is polynomial, taking into account the flux and heat balances derived from the mass and energy conservation principles. Performing a common full polynomial regression of a certain degree-considering all possible monomials formed by the set of measured variables-leads to a NP-hard and NP-complete problem. Besides, it would be necessary to decide the degree of the polynomial beforehand. This paper overcomes this drawback by proposing a greedy algorithm based on polynomial stepwise regression. It previously selects relevant monomials from the set of possible ones and then performs a common polynomial regression with the relevant monomials. The method is tested on some artificial data sets and some UCI repository data sets before being applied to the power plant condenser. The results show that our method successfully outperforms other state-of-the-art learning methods, taking into account both effectiveness and efficiency.

[1]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  I. Jolliffe Principal Component Analysis , 2002 .

[4]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[5]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[6]  Ignacio Martin,et al.  APPLICATION OF A DESIGN CODE FOR ESTIMATING FOULING ON-LINE IN A POWER PLANT CONDENSER REFRIGERATED BY SEA WATER , 2000 .

[7]  Wolfgang Bischoff,et al.  Normal distribution assumption and least squares estimation function in the model of polynomial regression , 1991 .

[8]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[9]  D. Yeung,et al.  Constructive feedforward neural networks for regression problems : a survey , 1995 .

[10]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[11]  Maurice Queyranne,et al.  Minimizing symmetric submodular functions , 1998, Math. Program..

[12]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[13]  Abhimanyu Das,et al.  Algorithms for subset selection in linear regression , 2008, STOC.

[14]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[15]  José Ramón Quevedo,et al.  A simple and efficient method for variable ranking according to their usefulness for learning , 2007, Comput. Stat. Data Anal..

[16]  D. Altman,et al.  Statistic Notes: Regression towards the mean , 1994, BMJ.

[17]  Ian B. Jeffery,et al.  Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data , 2006, BMC Bioinformatics.

[18]  Ian Witten,et al.  Data Mining , 2000 .

[19]  G. Lugosi,et al.  On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .

[20]  Peter J. Haas,et al.  Detecting Attribute Dependencies from Query Feedback , 2007, VLDB.