Chapter 3 – Research Design
This chapter represents how to apply proposed VaR models in predicting equity market risk. Basically, the thesis first outlines the collected empirical data. We next focus on verifying assumptions usually engaged in the VaR models and then identifying whether the data characteristics are in line with these assumptions through examining the observed data. Various VaR models are subsequently discussed, beginning with the non-parametric approach (the historical simulation model) and followed by the parametric approaches under different distributional assumptions of returns and intentionally with the combination of the Cornish-Fisher Expansion technique. Finally, backtesting techniques are employed to value the performance of the suggested VaR models.
The data used in the study are financial time series that reflect the daily historical price changes for two single equity index assets, including the FTSE 100 index of the UK market and the S&P 500 of the US market. Mathematically, instead of using the arithmetic return, the paper employs the daily log-returns. The full period, which the calculations are based on, stretches from 05/06/2002 to 22/06/2009 for each single index. More precisely, to implement the empirical test, the period will be divided separately into two sub-periods: the first series of empirical data, which are used to make the parameter estimation, spans from 05/06/2002 to 31/07/2007.
The rest of the data, which is between 01/08/2007 and 22/06/2009, is used for predicting VaR figures and backtesting. Do note here is that the latter stage is exactly the current global financial crisis period which began from the August of 2007, dramatically peaked in the ending months of 2008 and signally reduced significantly in the middle of 2009. Consequently, the study will purposely examine the accuracy of the VaR models within the volatile time.
3.1.1. FTSE 100 index
The FTSE 100 Index is a share index of the 100 most highly capitalised UK companies listed on the London Stock Exchange, began on 3rd January 1984. FTSE 100 companies represent about 81% of the market capitalisation of the whole London Stock Exchange and become the most widely used UK stock market indicator.
In the dissertation, the full data used for the empirical analysis consists of 1782 observations (1782 working days) of the UK FTSE 100 index covering the period from 05/06/2002 to 22/06/2009.
3.1.2. S&P 500 index
The S&P 500 is a value weighted index published since 1957 of the prices of 500 large-cap common stocks actively traded in the United States. The stocks listed on the S&P 500 are those of large publicly held companies that trade on either of the two largest American stock market companies, the NYSE Euronext and NASDAQ OMX. After the Dow Jones Industrial Average, the S&P 500 is the most widely followed index of large-cap American stocks. The S&P 500 refers not only to the index, but also to the 500 companies that have their common stock included in the index and consequently considered as a bellwether for the US economy.
Similar to the FTSE 100, the data for the S&P 500 is also observed during the same period with 1775 observations (1775 working days).
3.2. Data Analysis
For the VaR models, one of the most important aspects is assumptions relating to measuring VaR. This section first discusses several VaR assumptions and then examines the collected empirical data characteristics.
188.8.131.52. Normality assumption
As mentioned in the chapter 2, most VaR models assume that return distribution is normally distributed with mean of 0 and standard deviation of 1 (see figure 3.1). Nonetheless, the chapter 2 also shows that the actual return in most of previous empirical investigations does not completely follow the standard distribution.
Figure 3.1: Standard Normal Distribution
The skewness is a measure of asymmetry of the distribution of the financial time series around its mean. Normally data is assumed to be symmetrically distributed with skewness of 0. A dataset with either a positive or negative skew deviates from the normal distribution assumptions (see figure 3.2). This can cause parametric approaches, such as the Riskmetrics and the symmetric normal-GARCH(1,1) model under the assumption of standard distributed returns, to be less effective if asset returns are heavily skewed. The result can be an overestimation or underestimation of the VaR value depending on the skew of the underlying asset returns.
Figure 3.2: Plot of a positive or negative skew
The kurtosis measures the peakedness or flatness of the distribution of a data sample and describes how concentrated the returns are around their mean. A high value of kurtosis means that more of data’s variance comes from extreme deviations. In other words, a high kurtosis means that the assets returns consist of more extreme values than modeled by the normal distribution. This positive excess kurtosis is, according to Lee and Lee (2000) called leptokurtic and a negative excess kurtosis is called platykurtic. The data which is normally distributed has kurtosis of 3.
Figure 3.3: General forms of Kurtosis
In statistics, Jarque-Bera (JB) is a test statistic for testing whether the series is normally distributed. In other words, the Jarque-Bera test is a goodness-of-fit measure of departure from normality, based on the sample kurtosis and skewness. The test statistic JB is defined as:
where n is the number of observations, S is the sample skewness, K is the sample kurtosis. For large sample sizes, the test statistic has a Chi-square distribution with two degrees of freedom.
Augmented Dickey–Fuller Statistic
Augmented Dickey–Fuller test (ADF) is a test for a unit root in a time series sample. It is an augmented version of the Dickey–Fuller test for a larger and more complicated set of time series models. The ADF statistic used in the test is a negative number. The more negative it is, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence. ADF critical values: (1%) –3.4334, (5%) –2.8627, (10%) –2.5674.
184.108.40.206. Homoscedasticity assumption
Homoscedasticity refers to the assumption that the dependent variable exhibits similar amounts of variance across the range of values for an independent variable.
Figure 3.4: Plot of Homoscedasticity
Unfortunately, the chapter 2, based on the previous empirical studies confirmed that the financial markets usually experience unexpected events, uncertainties in prices (and returns) and exhibit non-constant variance (Heteroskedasticity). Indeed, the volatility of financial asset returns changes over time, with periods when volatility is exceptionally high interspersed with periods when volatility is unusually low, namely volatility clustering. It is one of the widely stylised facts (stylised statistical properties of asset returns) which are common to a common set of financial assets. The volatility clustering reflects that high-volatility events tend to cluster in time.
220.127.116.11. Stationarity assumption
According to Cont (2001), the most essential prerequisite of any statistical analysis of market data is the existence of some statistical properties of the data under study which remain constant over time, if not it is meaningless to try to recognize them.
One of the hypotheses relating to the invariance of statistical properties of the return process in time is the stationarity. This hypothesis assumes that for any set of time instants ,…, and any time interval the joint distribution of the returns ,…, is the same as the joint distribution of returns ,…,. The Augmented Dickey-Fuller test, in turn, will also be used to test whether time-series models are accurately to examine the stationary of statistical properties of the return.
18.104.22.168. Serial independence assumption
There are a large number of tests of randomness of the sample data. Autocorrelation plots are one common method test for randomness. Autocorrelation is the correlation between the returns at the different points in time. It is the same as calculating the correlation between two different time series, except that the same time series is used twice – once in its original form and once lagged one or more time periods.
The results can range from +1 to -1. An autocorrelation of +1 represents perfect positive correlation (i.e. an increase seen in one time series will lead to a proportionate increase in the other time series), while a value of -1 represents perfect negative correlation (i.e. an increase seen in one time series results in a proportionate decrease in the other time series).
In terms of econometrics, the autocorrelation plot will be examined based on the Ljung-Box Q statistic test. However, instead of testing randomness at each distinct lag, it tests the “overall” randomness based on a number of lags.
The Ljung-Box test can be defined as:
where n is the sample size,is the sample autocorrelation at lag j, and h is the number of lags being tested. The hypothesis of randomness is rejected if whereis the percent point function of the Chi-square distribution and the α is the quantile of the Chi-square distribution with h degrees of freedom.
3.2.2. Data Characteristics
Table 3.1 gives the descriptive statistics for the FTSE 100 and the S&P 500 daily stock market prices and returns. Daily returns are computed as logarithmic price relatives: Rt = ln(Pt/pt-1), where Pt is the closing daily price at time t. Figures 3.5a and 3.5b, 3.6a and 3.6b present the plots of returns and price index over time. Besides, Figures 3.7a and 3.7b, 3.8a and 3.8b illustrate the combination between the frequency distribution of the FTSE 100 and the S&P 500 daily return data and a normal distribution curve imposed, spanning from 05/06/2002 through 22/06/2009.
Table 3.1: Diagnostics table of statistical characteristics on the returns of the FTSE 100 Index and S&P 500 index between 05/06/2002 and 22/6/2009.
Number of observations
Augmented Dickey-Fuller (ADF) 2
The ratio of SD/mean
Note: 1. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.
2. 95% critical value for the augmented Dickey-Fuller statistic = -3.4158
Figure 3.5a: The FTSE 100 daily returns from 05/06/2002 to 22/06/2009
Figure 3.5b: The S&P 500 daily returns from 05/06/2002 to 22/06/2009
Figure 3.6a: The FTSE 100 daily closing prices from 05/06/2002 to 22/06/2009
Figure 3.6b: The S&P 500 daily closing prices from 05/06/2002 to 22/06/2009
Figure 3.7a: Histogram showing the FTSE 100 daily returns combined with a normal distribution curve, spanning from 05/06/2002 through 22/06/2009
Figure 3.7b: Histogram showing the S&P 500 daily returns combined with a normal distribution curve, spanning from 05/06/2002 through 22/06/2009
Figure 3.8a: Diagram showing the FTSE 100’ frequency distribution combined with a normal distribution curve, spanning from 05/06/2002 through 22/06/2009
Figure 3.8b: Diagram showing the S&P 500’ frequency distribution combined with a normal distribution curve, spanning from 05/06/2002 through 22/06/2009
The Table 3.1 shows that the FTSE 100 and the S&P 500 average daily return are approximately 0 percent, or at least very small compared to the sample standard deviation (the standard deviation is 141 and 144 times more than the size of the average return for the FTSE 100 and S&P 500, respectively). This is why the mean is often set at zero when modelling daily portfolio returns, which reduces the uncertainty and imprecision of the estimates. In addition, large standard deviation compared to the mean supports the evidence that daily changes are dominated by randomness and small mean can be disregarded in risk measure estimates.
Moreover, the paper also employes five statistics which often used in analysing data, including Skewness, Kurtosis, Jarque-Bera, Augmented Dickey-Fuller (ADF) and Ljung-Box test to examining the empirical full period, crossing from 05/06/2002 through 22/06/2009. Figure 3.7a and 3.7b demonstrate the histogram of the FTSE 100 and the S&P 500 daily return data with the normal distribution imposed. The distribution of both the indexes has longer, fatter tails and higher probabilities for extreme events than for the normal distribution, in particular on the negative side (negative skewness implying that the distribution has a long left tail).
Fatter negative tails mean a higher probability of large losses than the normal distribution would suggest. It is more peaked around its mean than the normal distribution, Indeed, the value for kurtosis is very high (10 and 12 for the FTSE 100 and the S&P 500, respectively compared to 3 of the normal distribution) (also see Figures 3.8a and 3.8b for more details). In other words, the most prominent deviation from the normal distributional assumption is the kurtosis, which can be seen from the middle bars of the histogram rising above the normal distribution. Moreover, it is obvious that outliers still exist, which indicates that excess kurtosis is still present.
The Jarque-Bera test rejects normality of returns at the 1% level of significance for both the indexes. So, the samples have all financial characteristics: volatility clustering and leptokurtosis. Besides that, the daily returns for both the indexes (presented in Figure 3.5a and 3.5b) reveal that volatility occurs in bursts; particularly the returns were very volatile at the beginning of examined period from June 2002 to the middle of June 2003. After remaining stable for about 4 years, the returns of the two well-known stock indexes in the world were highly volatile from July 2007 (when the credit crunch was about to begin) and even dramatically peaked since July 2008 to the end of June 2009.
Generally, there are two recognised characteristics of the collected daily data. First, extreme outcomes occur more often and are larger than that predicted by the normal distribution (fat tails). Second, the size of market movements is not constant over time (conditional volatility).
In terms of stationary, the Augmented Dickey-Fuller is adopted for the unit root test. The null hypothesis of this test is that there is a unit root (the time series is non-stationary). The alternative hypothesis is that the time series is stationary. If the null hypothesis is rejected, it means that the series is a stationary time series. In this thesis, the paper employs the ADF unit root test including an intercept and a trend term on return. The results from the ADF tests indicate that the test statistis for the FTSE 100 and the S&P 500 is -45.5849 and -37.6418, respectively. Such values are significantly less than the 95% critical value for the augmented Dickey-Fuller statistic (-3.4158). Therefore, we can reject the unit root null hypothesis and sum up that the daily return series is robustly stationary.
Finally, Table 3.1 shows the Ljung-Box test statistics for serial correlation of the return and squared return series for k = 12 lags, denoted by Q(k) and Q2(k), respectively. The Q(12) statistic is statistically significant implying the present of serial correlation in the FTSE 100 and the S&P 500 daily return series (first moment dependencies). In other words, the return series exhibit linear dependence.
Figure 3.9a: Autocorrelations of the FTSE 100 daily returns for Lags 1 through 100, covering 05/06/2002 to 22/06/2009.
Figure 3.9b: Autocorrelations of the S&P 500 daily returns for Lags 1 through 100, covering 05/06/2002 to 22/06/2009.
Figures 3.9a and 3.9b and the autocorrelation coefficient (presented in Table 3.1) tell that the FTSE 100 and the S&P 500 daily return did not display any systematic pattern and the returns have very little autocorrelations. According to Christoffersen (2003), in this situation we can write:
Corr(Rt+1,Rt+1-λ) ≈ 0, for λ = 1,2,3…, 100
Therefore, returns are almost impossible to predict from their own past.
One note is that since the mean of daily returns for both the indexes (-0.0001) is not significantly different from zero, and therefore, the variances of the return series are measured by squared returns. The Ljung-Box Q2 test statistic for the squared returns is much higher, indicating the presence of serial correlation in the squared return series. Figures 3.10a and 3.10b) and the autocorrelation coefficient (presented in Table 3.1) also confirm the autocorrelations in squared returns (variances) for the FTSE 100 and the S&P 500 data, and more importantly, variance displays positive correlation with its own past, especially with short lags.
Corr(R2t+1,R2t+1-λ) > 0, for λ = 1,2,3…, 100
Figure 3.10a: Autocorrelations of the FTSE 100 squared daily returns
Figure 3.10b: Autocorrelations of the S&P 500 squared daily returns
3.3. Calculation of Value At Risk
The section puts much emphasis on how to calculate VaR figures for both single return indexes from proposed models, including the Historical Simulation, the Riskmetrics, the Normal-GARCH(1,1) (or N-GARCH(1,1)) and the Student-t GARCH(1,1) (or t-GARCH(1,1)) model. Except the historical simulation model which does not make any assumptions about the shape of the distribution of the assets returns, the other ones commonly have been studied under the assumption that the returns are normally distributed. Based on the previous section relating to the examining data, this assumption is rejected because observed extreme outcomes of the both single index returns occur more often and are larger than predicted by the normal distribution.
Also, the volatility tends to change through time and periods of high and low volatility tend to cluster together. Consequently, the four proposed VaR models under the normal distribution either have particular limitations or unrealistic. Specifically, the historical simulation significantly assumes that the historically simulated returns are independently and identically distributed through time. Unfortunately, this assumption is impractical due to the volatility clustering of the empirical data. Similarly, although the Riskmetrics tries to avoid relying on sample observations and make use of additional information contained in the assumed distribution function, its normally distributional assumption is also unrealistic from the results of examining the collected data.
The normal-GARCH(1,1) model and the student-t GARCH(1,1) model, on the other hand, can capture the fat tails and volatility clustering which occur in the observed financial time series data, but their returns standard distributional assumption is also impossible comparing to the empirical data. Despite all these, the thesis still uses the four models under the standard distributional assumption of returns to comparing and evaluating their estimated results with the predicted results based on the student distributional assumption of returns.
Besides, since the empirical data experiences fatter tails more than that of the normal distribution, the essay intentionally employs the Cornish-Fisher Expansion technique to correct the z-value from the normal distribution to account for fatter tails, and then compare these results with the two results above. Therefore, in this chapter, we purposely calculate VaR by separating these three procedures into three different sections and final results will be discussed in length in chapter 4.
3.3.1. Components of VaR measures
Throughout the analysis, a holding period of one-trading day will be used. For the significance level, various values for the left tail probability level will be considered, ranging from the very conservative level of 1 percent to the mid of 2.5 percent and to the less cautious 5 percent.
The various VaR models will be estimated using the historical data of the two single return index samples, stretches from 05/06/2002 through 31/07/2007 (consisting of 1305 and 1298 prices observations for the FTSE 100 and the S&P 500, respectively) for making the parameter estimation, and from 01/08/2007 to 22/06/2009 for predicting VaRs and backtesting. One interesting point here is that since there are few previous empirical studies examining the performance of VaR models during periods of financial crisis, the paper deliberately backtest the validity of VaR models within the current global financial crisis from the beginning in August 2007.
3.3.2. Calculation of VaR
22.214.171.124. Non-parametric approach – Historical Simulation
As mentioned above, the historical simulation model pretends that the change in market factors from today to tomorrow will be the same as it was some time ago, and therefore, it is computed based on the historical returns distribution. Consequently, we separate this non-parametric approach into a section.
The chapter 2 has proved that calculating VaR using the historical simulation model is not mathematically complex since the measure only requires a rational period of historical data. Thus, the first task is to obtain an adequate historical time series for simulating. There are many previous studies presenting that predicted results of the model are relatively reliable once the window length of data used for simulating daily VaRs is not shorter than 1000 observed days.
In this sense, the study will be based on a sliding window of the previous 1305 and 1298 prices observations (1304 and 1297 returns observations) for the FTSE 100 and the S&P 500, respectively, spanning from 05/06/2002 through 31/07/2007. We have selected this rather than larger windows is since adding more historical data means adding older historical data which could be irrelevant to the future development of the returns indexes.
After sorting in ascending order the past returns attributed to equally spaced classes, the predicted VaRs are determined as that log-return lies on the target percentile, say, in the thesis is on three widely percentiles of 1%, 2.5% and 5% lower tail of the return distribution. The result is a frequency distribution of returns, which is displayed as a histogram, and shown in Figure 3.11a and 3.11b below. The vertical axis shows the number of days on which returns are attributed to the various classes. The red vertical lines in the histogram separate the lowest 1%, 2.5% and 5% returns from the remaining (99%, 97.5% and 95%) returns.
For FTSE 100, since the histogram is drawn from 1304 daily returns, the 99%, 97.5% and 95% daily VaRs are approximately the 13th, 33rd and 65th lowest return in this dataset which are -3.2%, -2.28% and -1.67%, respectively and are roughly marked in the histogram by the red vertical lines. The interpretation is that the VaR gives a number such that there is, say, a 1% chance of losing more than 3.2% of the single asset value tomorrow (on 01st August 2007). The S&P 500 VaR figures, on the other hand, are little bit smaller than that of the UK stock index with -2.74%, -2.03% and -1.53% corresponding to 99%, 97.5% and 95% confidence levels, respectively.
Figure 3.11a: Histogram of daily returns of FTSE 100 between 05/06/2002 and 31/07/2007
Figure 3.11b: Histogram of daily returns of S&P 500 between 05/06/2002 and 31/07/2007
Following predicted VaRs on the first day of the predicted period, we continuously calculate VaRs for the estimated period, covering from 01/08/2007 to 22/06/2009. The question is whether the proposed non-parametric model is accurately performed in the turbulent period will be discussed in length in the chapter 4.
126.96.36.199. Parametric approaches under the normal distributional assumption of returns
This section presents how to calculate the daily VaRs using the parametric approaches, including the RiskMetrics, the normal-GARCH(1,1) and the student-t GARCH(1,1) under the standard distributional assumption of returns. The results and the validity of each model during the turbulent period will deeply be considered in the chapter 4.
188.8.131.52.1. The RiskMetrics
Comparing to the historical simulation model, the RiskMetrics as discussed in the chapter 2 does not solely rely on sample observations; instead, they make use of additional information contained in the normal distribution function. All that needs is the current estimate of volatility. In this sense, we first calculate daily RiskMetrics variance for both the indexes, crossing the parameter estimated period from 05/06/2002 to 31/07/2007 based on the well-known RiskMetrics variance formula (2.9). Specifically, we had the fixed decay factor λ=0.94 (the RiskMetrics system suggested using λ=0.94 to forecast one-day volatility). Besides, the other parameters are easily calculated, for instance, and are the squared log-return and variance of the previous day, correspondingly.
After calculating the daily variance, we continuously measure VaRs for the forecasting period from 01/08/2007 to 22/06/2009 under different confidence levels of 99%, 97.5% and 95% based on the normal VaR formula (2.6), where the critical z-value of the normal distribution at each significance level is simply computed using the Excel function NORMSINV.
184.108.40.206.2. The Normal-GARCH(1,1) model
For GARCH models, the chapter 2 confirms that the most important point is to estimate the model parameters ,,. These parameters has to be calculated for numerically, using the method of maximum likelihood estimation (MLE). In fact, in order to do the MLE function, many previous studies efficiently use professional econometric softwares rather than handling the mathematical calculations. In the light of evidence, the normal-GARCH(1,1) is executed by using a well-known econometric tool, STATA, to estimate the model parameters (see Table 3.2 below).
Table 3.2. The parameters statistics of the Normal-GARCH(1,1) model for the FTSE 100 and the S&P 500
Number of Observations
* Note: In this section, we report the results from the Normal-GARCH(1,1) model using the method of maximum likelihood, under the assumption that the errors conditionally follow the normal distribution with significance level of 5%.
According to Table 3.2, the coefficients of the lagged squared returns () for both the indexes are positive, concluding that strong ARCH effects are apparent for both the financial markets. Also, the coefficients of lagged conditional variance () are significantly positive and less than one, indicating that the impact of ‘old’ news on volatility is significant. The magnitude of the coefficient, is especially high (around 0.89 – 0.93), indicating a long memory in the variance.
The estimate of was 1.2E-06 for the FTSE 100 and 1.1E-06 for the S&P 500 implying a long run standard deviation of daily market return of about 0.94% and 0.84%, respectively. The log-likehood for this model for both the indexes was 4401.63 and 4386.964 for the FTSE 100 and the S&P 500, correspondingly. The Log likehood ratios rejected the hypothesis of normality very strongly.
After calculating the model parameters, we begin measuring conditional variance (volatility) for the parameter estimated period, covering from 05/06/2002 to 31/07/2007 based on the conditional variance formula (2.11), where and are the squared log-return and conditional variance of the previous day, respectively. We then measure predicted daily VaRs for the forecasting period from 01/08/2007 to 22/06/2009 under confidence levels of 99%, 97.5% and 95% using the normal VaR formula (2.6). Again, the critical z-value of the normal distribution under significance levels of 1%, 2.5% and 5% is purely computed using the Excel function NORMSINV.
220.127.116.11.3. The Student-t GARCH(1,1) model
Different from the Normal-GARCH(1,1) approach, the model assumes that the volatility (or the errors of the returns) follows the Student-t distribution. In fact, many previous studies suggested that using the symmetric GARCH(1,1) model with the volatility following the Student-t distribution is more accurate than with that of the Normal distribution when examining financial time series. Accordingly, the paper additionally employs the Student-t GARCH(1,1) approach to measure VaRs. In this section, we use this model under the normal distributional assumption of returns. First is to estimate the model parameters using the method of maximum likelihood estimation and obtained by the STATA (see Table 3.3).
Table 3.3. The parameters statistics of the Student-t GARCH(1,1) model for the FTSE 100 and the S&P 500
Number of Observations
* Note: In this section, we report the results from the Student-t GARCH(1,1) model using the method of maximum likelihood, under the assumption that the errors conditionally follow the student distribution with significance level of 5%.
The Table 3.3 also identifies the same characteristics of the student-t GARCH(1,1) model parameters comparing to the normal-GARCH(1,1) approach. Specifically, the results of , expose that there were evidently strong ARCH effects occurred on the UK and US financial markets during the parameter estimated period, crossing from 05/06/2002 to 31/07/2007. Moreover, as Floros (2008) mentioned, there was also the considerable impact of ‘old’ news on volatility as well as a long memory in the variance. We at that time follow the similar steps as calculating VaRs using the normal-GARCH(1,1) model.
18.104.22.168. Parametric approaches under the normal distributional assumption of returns modified by the Cornish-Fisher Expansion technique
The section 22.214.171.124 measured the VaRs using the parametric approaches under the assumption that the returns are normally distributed. Regardless of their results and performance, it is clearly that this assumption is impractical since the fact that the collected empirical data experiences fatter tails more than that of the normal distribution. Consequently, in this section the study intentionally employs the Cornish-Fisher Expansion (CFE) technique to correct the z-value from the assumption of the normal distribution to significantly account for fatter tails. Again, the question of whether the proposed models achieved powerfully within the recent damage time will be assessed in length in the chapter 4.
126.96.36.199.1. The CFE-modified RiskMetrics