Conditional Density Estimation with Neural Networks: Best Practices and Benchmarks

Given a set of empirical observations, conditional density estimation aims to capture the statistical relationship between a conditional variable $\mathbf{x}$ and a dependent variable $\mathbf{y}$ by modeling their conditional probability $p(\mathbf{y}|\mathbf{x})$. The paper develops best practices for conditional density estimation for finance applications with neural networks, grounded on mathematical insights and empirical evaluations. In particular, we introduce a noise regularization and data normalization scheme, alleviating problems with over-fitting, initialization and hyper-parameter sensitivity of such estimators. We compare our proposed methodology with popular semi- and non-parametric density estimators, underpin its effectiveness in various benchmarks on simulated and Euro Stoxx 50 data and show its superior performance. Our methodology allows to obtain high-quality estimators for statistical expectations of higher moments, quantiles and non-linear return transformations, with very little assumptions about the return dynamic.

[1]  T. Bollerslev,et al.  A CONDITIONALLY HETEROSKEDASTIC TIME SERIES MODEL FOR SPECULATIVE PRICES AND RATES OF RETURN , 1987 .

[2]  Campbell R. Harvey,et al.  Current Version : April 6 , 1999 Autoregressive Conditional Skewness , 1999 .

[3]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[4]  Andrew R. Barron,et al.  Mixture Density Estimation , 1999, NIPS.

[5]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[6]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[7]  M. V. Gerven,et al.  The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables , 2017, 1705.07111.

[8]  L. Glosten,et al.  On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks , 1993 .

[9]  C. Conover,et al.  The Conditional CAPM Does Not Explain Asset-Pricing Anomalies , 2007 .

[10]  D. Zerom Godefay,et al.  On conditional density estimation , 2003 .

[11]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[12]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[13]  Samy Bengio,et al.  Conditional Gaussian mixture models for environmental risk mapping , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[14]  James G. Scott,et al.  Better Conditional Density Estimation for Neural Networks , 2016, 1606.02321.

[15]  E. Fama,et al.  Common risk factors in the returns on stocks and bonds , 1993 .

[16]  Takafumi Kanamori,et al.  Conditional Density Estimation via Least-Squares Density Ratio Estimation , 2010, AISTATS.

[17]  Eric Jondeau,et al.  Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements , 2003 .

[18]  J. Marron,et al.  Smoothed cross-validation , 1992 .

[19]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[20]  A. Cuevas,et al.  A comparative study of several smoothing methods in density estimation , 1994 .

[21]  Qi Li,et al.  Nonparametric Econometrics: Theory and Practice , 2006 .

[22]  Ivan Netuka,et al.  On threshold autoregressive processes , 1984, Kybernetika.

[23]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[24]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[25]  B. Hansen Autoregressive Conditional Density Estimation , 1994 .

[26]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[27]  Ekaba Bisong,et al.  Regularization for Deep Learning , 2019, Building Machine Learning and Deep Learning Models on Google Cloud Platform.

[28]  K P Pfeiffer Stepwise variable selection and maximum likelihood estimation of smoothing factors of kernel functions for nonparametric discriminant functions evaluated by different criteria. , 1985, Computers and biomedical research, an international journal.

[29]  William A. Barnett,et al.  Nonparametric and Semiparametric Methods in Econometrics and Statistics. , 1993 .

[30]  Amir Sarajedini,et al.  Conditional probability density function estimation with sigmoidal neural networks , 1999, IEEE Trans. Neural Networks.

[31]  Rob J Hyndman,et al.  Estimating and Visualizing Conditional Densities , 1996 .

[32]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[33]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[34]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[35]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[36]  B. Silverman,et al.  On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method , 1982 .

[37]  J. Sola,et al.  Importance of input data normalization for the application of neural networks to complex industrial problems , 1997 .

[38]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[39]  B. Mandelbrot The Variation of Some Other Speculative Prices , 1967 .

[40]  Bruno Feunou,et al.  Time-varying Crash Risk: The Role of Market Liquidity∗ , 2016 .

[41]  Sheridan Titman,et al.  On Persistence in Mutual Fund Performance , 1997 .

[42]  R. Jagannathan,et al.  The Conditional CAPM and the Cross-Section of Expected Returns , 1996 .

[43]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[44]  Richard E. Turner,et al.  Conditional Density Estimation with Bayesian Normalising Flows , 2018, 1802.04908.

[45]  Andrew R. Webb,et al.  Functional approximation by feed-forward networks: a least-squares approach to generalization , 1994, IEEE Trans. Neural Networks.

[46]  D. Madan,et al.  Stock Return Characteristics, Skew Laws, and the Differential Pricing of Individual Equity Options , 2000 .

[47]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[48]  E. Fama,et al.  A Five-Factor Asset Pricing Model , 2014 .

[49]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[50]  James Stephen Marron,et al.  Comparison of data-driven bandwith selectors , 1988 .

[51]  C. Bishop Mixture density networks , 1994 .

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[54]  Joel Grus,et al.  Data science from scratch , 2015 .

[55]  James D. Hamilton Time Series Analysis , 1994 .

[56]  Daniel B. Nelson,et al.  Inequality Constraints in the Univariate GARCH Model , 1992 .

[57]  H. Sung Gaussian Mixture Regression and Classification , 2004 .

[58]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[59]  Gurdip Bakshi,et al.  A Theory of Dissimilarity between Stochastic Discount Factors , 2017, Manag. Sci..

[60]  Enrique Sentana Quadratic Arch Models , 1995 .

[61]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .