Learning machines: Rationale and application in ground-level ozone prediction

Address the potential of learning machine to forecast ground-level ozone in urban area.Summarize the existing learning machines used to predict ground-level ozone.Compare the performance of commented models via practical case in Hong Kong.Address the underlying philosophy of using learning machine in ozone related prediction. Multilayer perceptron (MLP) and support vector machine (SVM), two popular learning machines, are increasingly being used as alternatives to classical statistical models for ground-level ozone prediction. However, employing learning machines without sufficient awareness about their limitations can lead to unsatisfactory results in modeling the ozone evolving mechanism, especially during ozone formation episodes. With the spirit of literature review and justification, this paper discusses, with respect to the concerning of ozone prediction, the recently developed algorithms/technologies for treating the most prominent model-performance-degradation limitations. MLP has the "black-box" property, i.e., it hardly provides physical explanation for the trained model, overfitting and local minima problems, and SVM has parameters identification and class imbalance problems. This commentary article aims to stress that the underlying philosophy of using learning machines is by no means as trivial as simply fitting models to the data because it causes difficulties, controversies or unresolved problems. This article also aims to serve as a reference point for further technical readings for experts in relevant fields.

[1]  Sancho Salcedo-Sanz,et al.  Prediction of hourly O3 concentrations using support vector regression algorithms , 2010 .

[2]  Yi Liu,et al.  Integrated soft sensor using just-in-time support vector regression and probabilistic analysis for quality prediction of multi-grade processes , 2013 .

[3]  Christopher M. Bishop,et al.  EM Optimization of Latent-Variable Density Models , 1995, NIPS 1995.

[4]  P. Viotti,et al.  Atmospheric urban pollution: applications of an artificial neural network (ANN) to the city of Perugia , 2002 .

[5]  Wlodzislaw Duch,et al.  Extraction of Logical Rules from Neural Networks , 1998, Neural Processing Letters.

[6]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[7]  Giorgio Corani,et al.  Air quality prediction in Milan: feed-forward neural networks, pruned neural networks and lazy learning , 2005 .

[8]  Richard Maclin,et al.  Boosting Classifiers Regionally , 1998, AAAI/IAAI.

[9]  Wei-Zhen Lu,et al.  Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme. , 2008, The Science of the total environment.

[10]  Sung Eun Kim,et al.  Tree-based threshold modeling for short-term forecast of daily maximum ozone level , 2010 .

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[13]  Haiqing Wang,et al.  Soft Chemical Analyzer Development Using Adaptive Least-Squares Support Vector Regression with Selective Pruning and Variable Moving Window Size , 2009 .

[14]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[15]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[16]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[17]  S Mangrulkar,et al.  Artificial neural systems. , 1990, ISA transactions.

[18]  Gabriel Ibarra-Berastegi,et al.  Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area , 2006, Environ. Model. Softw..

[19]  Wei Liu,et al.  A Novel Interpolation Method Based on Differential Evolution-Simplex Algorithm Optimized Parameters for Support Vector Regression , 2010, ISICA.

[20]  A. Chelani,et al.  Prediction of daily maximum ground ozone concentration using support vector machine , 2010, Environmental monitoring and assessment.

[21]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[22]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[23]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[24]  G. Boehm,et al.  Jenseits der Sprache? : Anmerkungen zur Logik der Bilder , 2004 .

[25]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[26]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[27]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[28]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[29]  E. Fiesler,et al.  Comparative Bibliography of Ontogenic Neural Networks , 1994 .

[30]  Nikola K. Kasabov,et al.  DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction , 2002, IEEE Trans. Fuzzy Syst..

[31]  Tony R. Martinez,et al.  Instance Pruning Techniques , 1997, ICML.

[32]  Nikola Kasabov,et al.  Evolving Connectionist Systems: The Knowledge Engineering Approach , 2007 .

[33]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[34]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[35]  L. Breiman Bias-variance, regularization, instability and stabilization , 1998 .

[36]  Plamen Angelov,et al.  Evolving Intelligent Systems: Methodology and Applications , 2010 .

[37]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[38]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[39]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  Li-Chiu Chang,et al.  Forecasting of ozone episode days by cost-sensitive neural network methods. , 2009, The Science of the total environment.

[41]  Fritz Wysotzki,et al.  Automatic construction of decision trees for classification , 1994, Ann. Oper. Res..

[42]  Ron Kohavi,et al.  Improving simple Bayes , 1997 .

[43]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[44]  Włodzisław Duch,et al.  Similarity-based methods: a general framework for classification, approximation and association , 2000 .

[45]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[46]  David E. Rumelhart,et al.  BACK-PROPAGATION, WEIGHT-ELIMINATION AND TIME SERIES PREDICTION , 1991 .

[47]  Astrit Schmidt-Burkhardt,et al.  Wissen als Bild. Zur diagrammatischen Kunstgeschichte , 2009, Logik des Bildlichen.

[48]  Krzysztof Grabczewski,et al.  Extraction of logical rules from backpropagation networks , 1998 .

[49]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[50]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[51]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[52]  Norbert Jankowski,et al.  Survey of Neural Transfer Functions , 1999 .

[53]  Wlodzislaw Duch,et al.  Optimal transfer function neural networks , 2001, ESANN.

[54]  Kathleen E. Duncan,et al.  Ozone Modeling Using Neural Networks , 2000 .

[55]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.

[56]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[57]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[58]  Norbert Jankowski,et al.  Heterogenous committees with competence analysis , 2005, Fifth International Conference on Hybrid Intelligent Systems (HIS'05).

[59]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[60]  A. Leung,et al.  Prediction of maximum daily ozone level using combined neural network and statistical characteristics. , 2003, Environment international.

[61]  J. Skrzypski,et al.  Neural network prediction models as a tool for air quality management in cities , 2008 .

[62]  Zengliang Gao,et al.  Just-in-time kernel learning with adaptive parameter selection for soft sensor modeling of batch processes , 2012 .

[63]  Xiekang Wang,et al.  A preliminary study of ozone trend and its impact on environment in Hong Kong. , 2002, Environment international.

[64]  D. Hand,et al.  Artificial Intelligence Frontiers in Statistics , 2020 .

[65]  V. Prybutok,et al.  A neural network model forecasting for prediction of daily maximum ozone concentration in an industrialized urban area. , 1996, Environmental pollution.

[66]  Stephen Dorling,et al.  Meteorologically adjusted trends in UK daily maximum surface ozone concentrations , 2000 .

[67]  Adam Weintrit,et al.  Methods and Algorithms , 2011 .

[68]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[69]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[70]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[71]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[72]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[73]  Włodzisław Duch,et al.  Competent Undemocratic Committees , 2003 .

[74]  Lambros Ekonomou,et al.  Greek long-term energy consumption prediction using artificial neural networks , 2010 .

[75]  Wei-Zhen Lu,et al.  Ground-level ozone prediction using multilayer perceptron trained with an innovative hybrid approach , 2006 .

[76]  Yong Yu,et al.  Sales forecasting using extreme learning machine with applications in fashion retailing , 2008, Decis. Support Syst..

[77]  Sancho Salcedo-Sanz,et al.  Improving the prediction of average total ozone in column over the Iberian Peninsula using neural networks banks , 2011, Neurocomputing.

[78]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[79]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[80]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[81]  Victor R. Prybutok,et al.  Comparison of neural network models with ARIMA and regression models for prediction of Houston's daily maximum ozone concentrations , 2000, Eur. J. Oper. Res..

[82]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[83]  Saleh M. Al-Alawi,et al.  Assessment and prediction of tropospheric ozone concentration levels using artificial neural networks , 2002, Environ. Model. Softw..

[84]  A. Comrie Comparing Neural Networks and Regression Models for Ozone Forecasting , 1997 .

[85]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[86]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[87]  Norbert Jankowski,et al.  New developments in the Feature Space Mapping model , 2000 .

[88]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[89]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[90]  C. Lee Giles,et al.  Overfitting and neural networks: conjugate gradient and backpropagation , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[92]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[93]  Sancho Salcedo-Sanz,et al.  Spatial regression analysis of NOx and O3 concentrations in Madrid urban area using Radial Basis Function networks , 2009 .

[94]  Weifeng Liu,et al.  Adaptive and Learning Systems for Signal Processing, Communication, and Control , 2010 .

[95]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[96]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[97]  Giuseppe Nunnari,et al.  The application of neural techniques to the modelling of time-series of atmospheric pollution data , 1998 .

[98]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[99]  Wenjian Wang,et al.  Determination of the spread parameter in the Gaussian kernel for classification and regression , 2003, Neurocomputing.

[100]  Vladimir Cherkassky,et al.  Learning from data , 1998 .

[101]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[102]  José David Martín-Guerrero,et al.  Neural networks for analysing the relevance of input variables in the prediction of tropospheric ozone concentration , 2006 .

[103]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[104]  Visakan Kadirkamanathan,et al.  Statistical Control of RBF-like Networks for Classification , 1997, ICANN.

[105]  S. M. Lo,et al.  Application of evolutionary neural network method in predicting pollutant levels in downtown area of Hong Kong , 2003, Neurocomputing.

[106]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[107]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[108]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[109]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[110]  Sait Cemil Sofuoğlu,et al.  Application of artificial neural networks to predict prevalence of building-related symptoms in office buildings , 2008 .

[111]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[112]  Archontoula Chaloulakou,et al.  Comparative assessment of neural networks and regression models for forecasting summertime ozone in Athens. , 2003, The Science of the total environment.

[113]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[114]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[115]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[116]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[117]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[118]  Hsin-Chung Lu,et al.  Prediction of daily maximum ozone concentrations from meteorological conditions using a two-stage neural network , 2006 .

[119]  Marek Kretowski,et al.  Induction of Multivariate Decision Trees by Using Dipolar Criteria , 2000, PKDD.

[120]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[121]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[122]  G. Spellman An application of artificial neural networks to the prediction of surface ozone concentrations in the United Kingdom , 1999 .

[123]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[124]  Wei-Zhen Lu,et al.  Interval estimation of urban ozone level and selection of influential factors by employing automatic relevance determination model. , 2006, Chemosphere.

[125]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[126]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[127]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[128]  Daniel F. Leite,et al.  Evolving granular neural networks from fuzzy data streams , 2013, Neural Networks.

[129]  Ruihong Zhang,et al.  Global sensitivity analysis of a process-based model for ammonia emissions from manure storage and treatment structures , 2010 .

[130]  Wei-Zhen Lu,et al.  Potential assessment of the "support vector machine" method in forecasting ambient air pollutant trends. , 2005, Chemosphere.

[131]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[132]  Gustavo Camps-Valls,et al.  Unbiased sensitivity analysis and pruning techniques in neural networks for surface ozone modelling , 2005 .

[133]  Włodzisław Duch,et al.  Heterogeneous adaptive systems , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[134]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[135]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[136]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[137]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[138]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[139]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[140]  Gary J. Slater,et al.  Concepts, Methods, and Applications , 2011 .

[141]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[142]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[143]  Xiaodong Li,et al.  Artificial Neural Network Models for Daily PM10 Air Pollution Index Prediction in the Urban Area of Wuhan, China , 2011 .

[144]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[145]  Wenjian Wang,et al.  Online prediction model based on support vector machine , 2008, Neurocomputing.

[146]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[147]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[148]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[149]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[150]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[151]  G. Smits,et al.  Estimation of the regularization parameter for support vector regression , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[152]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[153]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[154]  Wei-Zhen Lu,et al.  Forecasting of ozone level in time series using MLP model with a novel hybrid training algorithm , 2006 .

[155]  Carla E. Brodley,et al.  Linear Machine Decision Trees , 1991 .

[156]  Ravi Sankar,et al.  Time Series Prediction Using Support Vector Machines: A Survey , 2009, IEEE Computational Intelligence Magazine.

[157]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[158]  Gavin C. Cawley,et al.  A rigorous inter-comparison of ground-level ozone predictions , 2003 .

[159]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[160]  J. L. Carrasco-Rodriguez,et al.  Effective 1-day ahead prediction of hourly surface ozone concentrations in eastern Spain using linear models and neural networks , 2002 .

[161]  Sung Eun Kim,et al.  Accounting seasonal nonstationarity in time series models for short-term ozone level forecast , 2005 .

[162]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[163]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[164]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[165]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[166]  Wei-Zhen Lu,et al.  Assessing the relative importance of surface ozone influential variables in regional-scale analysis , 2009 .

[167]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[168]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.