Model evaluation is crucial for the
nancial industry and especially for risk management purposes. It provides the means of determining the accuracy of the implemented models while it is a prerequisite for the internal models approach. Given that the actual return generating process is unknown, additional statistical assumptions are needed in order to proceed with the model evaluation. These assumptions introduce distortions in the evaluation process, which a¤ect the reliability of results. This paper reviews the major Value at Risk and Expected Shortfall forecast evaluation methods and their performance. While focusing on the regulatory speci
cation, we include various simulation setups to evaluate the e¤ects of model risk and the length of the out-of-sample period. Our
ndings highlight the ine¢ ciencies of the forecast backtesting methods for the regulatory speci
cations, which may eventually lead to the selection of inaccurate and misspeci
ed models. Keywords: Value-at-Risk, Expected Shortfall, Model Accuracy, Backtesting, Forecast Evaluation Acknowledgements: This work was supported by the Economic and Social Research Council [grant number 1494761]. Corresponding author. Ekaterini Panopoulou, Kent Business School, University of Kent, Canterbury CT2 7PE, United Kingdom, Tel.: 0044 1227 824469, Email: A.Panopoulou@kent.ac.uk. 1 Introduction Evaluating/backtesting a risk model consists of ex-post out-of-sample comparisons of actual returns with the models forecasted risk measures. It provides the statistical means for determining the accuracy of the plethora of proposed models. Although backtesting is a crucial component of the internal models approach, there are no speci
c regulatory recommendations for the type of tests that should be used. On the contrary, the evaluation methodology is freely chosen by the implementing institution. Since the Market Risk Amendment to the Basel I accord in 1996, Value at Risk (VaR) has been the dominant risk measure for market risk quanti
cation. VaR was formally introduced by JP Morgan in 1994 and can be de
ned as the maximum loss for a given signi
cance level q over a speci
c time horizon. In other words, VaR constitutes the q-quantile of the forecasted return distribution. Although the quantile nature of VaR makes it easy to understand and implement, it is also the main source of criticism. Among others, Danielsson et al. (2001) and Dowd (2005) point out two de
ciencies of VaR. First, it represents a simple point on the tails of the distribution and therefore does not provide any information beyond that. Second, VaR is not a coherent risk measure as it is not subadditive for the general case of distributions. These de
ciencies alongside the aftermath of the recent
nancial crisis led to the proposal of the coherent Expected Shortfall (ES ) as a substitute for VaR (BIS (2012)). In theory, ES seems superior and more informative than VaR since it averages the tail losses over the signi
cance level q. However, Embrechts et al. (2014) conclude that both risk measures are estimated statistical quantities, a fact that makes their implementation a subject for future research. With respect to the evaluation of candidate risk models, there is a large body of literature proposing di¤erent approaches. For instance, the results of Diebold et al. (1998) and Berkowitz (2001) established the density evaluation approach, which evaluates the
t of the models implied density. Although intuitive, density evaluation requires the disclosure of key information regarding an institutions returns, thus making it less appealing for the
nancial industry. Within the density evaluation approach, Amisano and Giacomini (2007) advocate towards comparing models performance instead of evaluating their
t. The authors argue that, given the absence of a correct proxy for the actual return distribution, evaluating the
t of the model maybe futile. Unlike the aforementioned density evaluation approaches, forecast evaluation tests the models accuracy by assessing the properties embedded in the forecasts. This approach was established as the industry benchmark since it provides intuitive motivation, ease of implementation and little demand for sensitive portfolio returns information. Despite its advantages and appeal, the Forecast Evaluation Approach methods su¤er from small sample ine¢ ciencies attributed to the regulatory speci
cations and various methodological assumptions. More in detail, the scarcity of events at the 1% VaR and 1-year out-of-sample period reduce the amount of information input, thus resulting in small power. In addition, the structural and distributional assumptions of the tests reinforce the negative e¤ects on their power. Similarly, model risk emerges as another major source of unreliability since it can distort the results in two possible ways. First, Escanciano and Olmo (2010) and Escanciano and Olmo 1 (2011) suggest that the forecast inherited model risk a¤ects the asymptotic variance of the test statistics. Second, the actual test statistics assumptions may be evaluating a human-designed structure that accounts only for small fragments of the actual return dynamics. Finally, with respect to the risk measure selection there is a large debate regarding the underlying risk measure and its ability of being evaluated. Some academics suggest that ES forecasts can not be evaluated given the measures lack of elicitability (see, for example Ziegel (2014)). On the other hand, there is new evidence that elicitability is not necessary for forecast backtesting rather than comparing their performance (see, for example Emmer et al. (2015)). In a recent paper, Nieto and Ruiz (2016) survey the VaR forecasting and backtesting literature without however evaluating the performance of the backtesting methods. Speci
cally, the authors evaluate the performance of various VaR forecasting methods at the 1% coverage level. Their empirical exercise includes di¤erent setups designed to look into the e¤ects of various in-sample and out-of-sample periods. The corresponding results suggest that there is signi
cant variation in the accuracy of the forecasting methods. In addition, the authors conclude that simpler methods with asymmetric volatility dynamics and error distributions are the most competitive. Our work is, in principle, similar to Campbell (2007) and complements Nieto and Ruiz (2016) by focusing on the performance of the backtesting methods. In this paper, we survey the methodological and empirical developments in the risk forecast evaluation approach. Speci
cally, we focus on the regulatory speci
cations of 1% VaR and 2.5% ES forecasts while implementing two di¤erent out-of-sample periods in order to evaluate the small sample properties of the tests. In addition, we utilize various in-sample periods in order to evaluate the e¤ect of forecast estimation risk. Finally, we use the S&P 500 returns to evaluate the forecasts on real
nancial data. Our
ndings suggest that the low power of the tests renders the evaluation results unreliable. Speci
cally, the scarcity of extreme events at the high coverage level distort the size of the tests. The tests are oversized for almost all the cases under scrutiny, thus questioning the validity of their asymptotic distributions. Similarly, the power of the tests is low which again is attributed to the scarcity of events and the forecasting methods structure. With respect to the latter, the overall results suggest that the reliability of each backtesting method varies with the assumptions of the forecasting method. Finally, model risk reduces even more the power of the methods thus aggravating the mistrust on the backtesting methods. The rest of the paper is structured as follows. In Section 2 we introduce the VaR and ES de
nitions notions and main forecasting approaches. Section 3 describes the evaluation methods for VaR and ES while in Section 4 we perform the small sample evaluation of the tests. Finally, in Section 5 we conduct an empirical implementation on the S&P returns and Section 6 concludes. 2 Risk Forecasting In this section, we describe the risk forecasting procedure and the main forecasting approaches. First, we give a brief description of the
nancial industrys most common risk measures and
[1]
S. Laurent,et al.
Modelling Daily Value-at-Risk Using Realized Volatility and Arch Type Models
,
2001
.
[2]
Jeremy Berkowitz.
Testing Density Forecasts, With Applications to Risk Management
,
2001
.
[3]
Gianni Amisano,et al.
Comparing Density Forecasts via Weighted Likelihood Ratio Tests
,
2007
.
[4]
Peter Christoffersen,et al.
Série Scientifique Scientific Series 2003 s-05 Backtesting Value-at-Risk : A Duration-Based Approach
,
2003
.
[5]
Paul H. Kupiec,et al.
Techniques for Verifying the Accuracy of Risk Measurement Models
,
1995
.
[6]
Todd E. Clark,et al.
Forecast Combination Across Estimation Windows
,
2011
.
[7]
T. Gneiting.
Making and Evaluating Point Forecasts
,
2009,
0912.0902.
[8]
Jeremy Berkowitz,et al.
Evaluating Value-at-Risk Models with Desk-Level Data
,
2007,
Manag. Sci..
[9]
Christophe Hurlin,et al.
The Risk Map: A New Tool for Validating Risk Models
,
2012
.
[10]
M. Rocco.
Extreme Value Theory for Finance: A Survey
,
2012
.
[11]
Peter Christoffersen,et al.
Elements of Financial Risk Management
,
2003
.
[12]
Peter F. Christoffersen.
Evaluating Interval Forecasts
,
1998
.
[13]
S. Laurent,et al.
Value-at-Risk for long and short trading positions
,
2003
.
[14]
Zaichao Du,et al.
Backtesting Expected Shortfall: Accounting for Tail Risk
,
2015,
Manag. Sci..
[15]
Model risk of risk models
,
2016
.
[16]
S. Nadarajah,et al.
Estimation methods for expected shortfall
,
2014
.
[17]
K. Dowd.
Measuring Market Risk
,
2002
.
[18]
D. Tasche,et al.
On the coherence of expected shortfall
,
2001,
cond-mat/0104295.
[19]
Masaaki Kijima,et al.
On the significance of expected shortfall as a coherent risk measure
,
2005
.
[20]
Jón Dańıelsson,et al.
Fat tails, VaR and subadditivity☆
,
2013
.
[21]
W. Pohlmeier,et al.
Improving the Value at Risk Forecasts: Theory and Evidence from the Financial Crisis
,
2011
.
[22]
Esther Ruiz,et al.
Frontiers in VaR forecasting and backtesting
,
2016
.
[23]
M. Righi,et al.
A Comparison of Expected Shortfall Estimation Models
,
2014
.
[24]
Christophe Hurlin,et al.
Backtesting value-at-risk : a GMM duration-based test
,
2008
.
[25]
D. Hunter.
Improved duration-based backtesting of value-at-risk
,
2006
.
[26]
Christophe Pérignon,et al.
A New Approach to Comparing VaR Estimation Methods
,
2008,
The Journal of Derivatives.
[27]
C. Goodhart,et al.
An academic response to Basel II
,
2001
.
[28]
M. Isabel Fraga Alves,et al.
A new class of independence tests for interval forecasts evaluation
,
2012,
Comput. Stat. Data Anal..
[29]
R. Tunaru,et al.
On risk management problems related to a coherence property
,
2006
.
[30]
Sean D. Campbell.
A review of backtesting and backtesting procedures
,
2005
.
[31]
Jean-Marie Dufour,et al.
Monte Carlo tests with nuisance parameters: a general approach to finite-sample inference and nonstandard
,
2006
.
[32]
Giovanni Barone-Adesi,et al.
VaR without correlations for portfolios of derivative securities
,
1999
.
[33]
J. Carlos Escanciano,et al.
Robust Backtesting Tests for Value-at-Risk Models
,
2008
.
[34]
Bertrand Melenberg,et al.
Backtesting for Risk-Based Regulatory Capital
,
2002
.
[35]
Matthew Pritsker,et al.
The Hidden Dangers of Historical Simulation
,
2001
.
[36]
Giorgio Szegö,et al.
Measures of risk
,
2002,
Eur. J. Oper. Res..
[37]
James M. O'Brien,et al.
An Evaluation of Bank VaR Measures for Market Risk During and Before the Financial Crisis
,
2014
.
[38]
J. Carlos Escanciano,et al.
Backtesting Parametric Value-at-Risk With Estimation Risk
,
2008
.
[39]
Ana‐Maria Fuertes,et al.
Optimally Harnessing Inter-Day and Intra-Day Information for Daily Value-at-Risk Prediction
,
2012
.
[40]
P. Embrechts,et al.
An Academic Response to Basel 3.5
,
2014
.
[41]
Pablo Koch-Medina,et al.
Unexpected Shortfalls of Expected Shortfall: Extreme Default Profiles and Regulatory Arbitrage
,
2015
.
[42]
Pedro Gurrola-Perez,et al.
Filtered Historical Simulation Value-at-Risk Models and Their Competitors
,
2015
.
[43]
Jacob Boudoukh,et al.
The Best of Both Worlds: A Hybrid Approach to Calculating Value at Risk
,
1997
.
[44]
Wagner Piazza.
Evaluating Value-at-Risk models via Quantile Regression
,
2009
.
[45]
M. Kratz,et al.
What is the Best Risk Measure in Practice? A Comparison of Standard Measures
,
2013,
1312.1645.
[46]
Phhilippe Jorion.
Value at Risk: The New Benchmark for Managing Financial Risk
,
2000
.
[47]
Wei Wei,et al.
The geometric-VaR backtesting method
,
2016
.
[48]
A. McNeil,et al.
Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach
,
2000
.
[49]
Giovanni Urga,et al.
Evaluating the accuracy of value-at-risk forecasts: New multilevel tests
,
2014
.
[50]
G. Barone-Adesi.
VaR Without Correlations for Nonlinear Portfolios
,
1998
.
[51]
R. Tunaru,et al.
Coherent risk measures under filtered historical simulation
,
2005
.
[52]
Christophe Hurlin,et al.
Backtesting Value-at-Risk: From Dynamic Quantile to Dynamic Binary Tests
,
2012
.
[53]
Michael McAleer,et al.
International Evidence on GFC-Robust Forecasts for Risk Management Under the Basel Accord
,
2011
.