Sparse, Interpretable and Transparent Predictive Model Identification for Healthcare Data Analysis

Data-driven modelling approaches play an indispensable role in analyzing and understanding complex processes. This study proposes a type of sparse, interpretable and transparent (SIT) machine learning model, which can be used to understand the dependent relationship of a response variable on a set of potential explanatory variables. An ideal candidate for such a SIT representation is the well-known NARMAX (nonlinear autoregressive moving average with exogenous inputs) model, which can be established from measured input and output data of the system of interest, and the final refined model is usually simple, parsimonious and easy to interpret. The performance of the proposed SIT models is evaluated through two real healthcare datasets.

[1]  S. Billings,et al.  Prediction of the Dst index using multiresolution wavelet models , 2004 .

[2]  J. Shaman,et al.  Forecasting seasonal outbreaks of influenza , 2012, Proceedings of the National Academy of Sciences.

[3]  Dacheng Tao,et al.  Patch Alignment Manifold Matting , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Marcello Farina,et al.  Forecasting peak air pollution levels using NARX models , 2009, Eng. Appl. Artif. Intell..

[5]  Brian Helmuth,et al.  From cells to coastlines: how can we use physiology to forecast the impacts of climate change? , 2009, Journal of Experimental Biology.

[6]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[7]  S. A. Billings,et al.  Using the NARMAX approach to model the evolution of energetic electrons fluxes at geostationary orbit , 2011 .

[8]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[9]  Mike E. Davies,et al.  Fast non-negative orthogonal least squares , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[10]  Stephen A. Billings,et al.  Model structure selection using an integrated forward orthogonal search algorithm assisted by squared correlation and mutual information , 2008, Int. J. Model. Identif. Control..

[11]  Jingjing Xie,et al.  Air pollutants concentrations forecasting using back propagation neural network based on wavelet decomposition with meteorological conditions , 2016 .

[12]  S. A. Billings,et al.  Forecasting relativistic electron flux using dynamic multiple regression models , 2010 .

[13]  Steve A. Billings,et al.  Term and variable selection for non-linear system identification , 2004 .

[14]  Hua-Liang Wei,et al.  Applications of NARMAX in Space Weather , 2018 .

[15]  S. Billings,et al.  Rational model identification using an extended least-squares algorithm , 1991 .

[16]  S. A. Billings,et al.  An efficient nonlinear cardinal B-spline model for high tide forecasts at the Venice Lagoon , 2006 .

[17]  Michael A. Balikhin,et al.  System Identification and Data‐Driven Forecasting of AE Index and Prediction Uncertainty Analysis Using a New Cloud‐NARX Model , 2019, Journal of Geophysical Research: Space Physics.

[18]  Hua-Liang Wei,et al.  Significant Indicators and Determinants of Happiness: Evidence from a UK Survey and Revealed by a Data-Driven Systems Modelling Approach , 2018 .

[19]  Reid Priedhorsky,et al.  Dynamic Bayesian Influenza Forecasting in the United States with Hierarchical Discrepancy (with Discussion) , 2017, Bayesian Analysis.

[20]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[21]  Tingwen Huang,et al.  Time-Varying System Identification Using an Ultra-Orthogonal Forward Regression and Multiwavelet Basis Functions With Applications to EEG , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  S. A. Billings,et al.  Using the NARMAX OLS-ERR algorithm to obtain the most influential coupling functions that affect the evolution of the magnetosphere , 2011 .

[23]  T. Dawson,et al.  Predicting the impacts of climate change on the distribution of species: are bioclimate envelope models useful? , 2003 .

[24]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[25]  S. A. Billings,et al.  A century of variation in the dependence of Greenland iceberg calving on ice sheet surface mass balance and regional climate change , 2014, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[26]  Dongbing Gu,et al.  Identification of robotic systems with hysteresis using Nonlinear AutoRegressive eXogenous input models , 2017 .

[27]  C G Billings,et al.  The prediction of in-flight hypoxaemia using non-linear equations. , 2013, Respiratory medicine.

[28]  George W. Irwin,et al.  Two-Stage Orthogonal Least Squares Methods for Neural Network Construction , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29]  S. A. Billings,et al.  Forecasting the geomagnetic activity of the Dst index using multiscale radial basis function networks , 2007 .

[30]  S. A. Billings,et al.  The wavelet-NARMAX representation: A hybrid model structure combining polynomial models with multiresolution wavelet decompositions , 2005, Int. J. Syst. Sci..

[31]  Enrico Camporeale,et al.  The Challenge of Machine Learning in Space Weather: Nowcasting and Forecasting , 2019, Space Weather.

[32]  Kerrie Mengersen,et al.  Using Google Trends and ambient temperature to predict seasonal influenza outbreaks. , 2018, Environment international.

[33]  Guilherme A. S. Pereira,et al.  Learning robot reaching motions by demonstration using nonlinear autoregressive models , 2018, Robotics Auton. Syst..

[34]  O. Nelles Nonlinear System Identification , 2001 .

[35]  Wei Sun,et al.  Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. , 2017, Journal of environmental management.

[36]  Bing Lam Luk,et al.  Orthogonal-least-squares regression: A unified approach for data modelling , 2009, Neurocomputing.

[37]  S. Billings Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains , 2013 .

[38]  Stephen A. Billings,et al.  Identification of nonlinear systems with non-persistent excitation using an iterative forward orthogonal least squares regression algorithm , 2015, Int. J. Model. Identif. Control..

[39]  Qi Li,et al.  Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation , 2015 .

[40]  Nitesh V. Chawla,et al.  Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science , 2011, Stat. Anal. Data Min..