Do search engine data improve financial time series volatility predictions in different market periods? An empirical analysis on major world financial indices.

In this paper, we investigate the different influence of search engine data in different market periods on the improvement of the prediction of the financial time series volatility. We use the EGARCH and the EGARCH-SVI model. We analyze weekly data from the Dow Jones, FTSE 100 and Nikkei 225 market indices and the weekly search volume index (SVI) from google trends for market indices keywords. The main contribution of this paper is introducing limitations of the EGARCH-SVI model for forecasting the weekly volatility of the market index. Our results show that i) search engine data improve financial time series volatility predictions of the EGARCH-SVI model in market crisis periods with the bigger price volatility; and ii) search engine data is not improving the prediction of the financial time series volatility of the EGARCH-SVI model in a non-crisis periods with low price volatility in the market. This result also confirms the predictive power of the EGARCH-SVI model in crisis periods for different financial markets.


INTRODUCTION
With the use of web applications in the internet we create additional data. We leave a trace of our interests and opinions. When we use search engine applications like Google we create additional information about our search. With this information Google create the search volume index (SVI) for the search queries or keywords that are used for the search. This data is publicly available with the product Google Trends [18]. The output of this product is the time series of the relative popularity of keywords or search queries over time. Google"s Chief Economist Hal Varian suggested that search data have the potential to capture the interest in different economic activities in real time [1]. Evidence that search data can predict home sales, automotive sales, and tourism is provided by Choi and Variant [2]. Da and Engelberg [3] propose a new and straightforward measurement of investors" attention using the search frequency SVI from Google Trends. Many other researches are using Google Trends for different purposes [4, 5, 6, 7, 8, and 9]. Internet data can also be used for making better financial predictions. The idea of using this data in finance is not new. Antweiler and Frank [10] investigate the mood of traders in forums, Gerow and Keane [11] use newspapers as additional information, Bollen [12] uses tweets, Glibert and Karahalios [13] use blogs. To determining the moods of traders also requires parsing content of the posts and classify this contents as a positive or negative signal. In the financial industry volatility analysis of the financial time series is very important. The forecast of the volatility is important for risk management and option pricing. The most popular model is the Autoregressive Conditional Heteroskedasticity (ARCH) model by Engle [14]. By Bollerslev [15] this model was extended to the Generalize ARCH (GARCH) model. The exponential GARCH (EGARCH) model was introduced by Nelson [16]. This model performs well for equity returns. The EGARCH-SVI model was introduced by our previous paper [17]. We extend the conventional EGARCH model to the EGARCH-SVI model by adding weekly SVI from Google Trends as an exogenous variable to the EGARCH model. In this study we are analyzing the predictiv power of the EGARCH-SVI model in different market periods. For the analysis we use Dow Jones, FTSE 100 and NIKKEI 225 indices. We divide the dataset into three parts. One set is before the financial crisis; second set is in the period of the financial crisis, the third set is after the financial crisis. The empirical analysis shows that the EGARCH-SVI model provides good results in the period of financial crisis with the bigger price volatility in the market. The remainder of this paper is as follows: in section 2 we give a detailed description of the data, in section 3 we describe the EGARCH-SVI model, in the section 4 we describe the empirical results and in the last section we conclude.   From Google Trends, we use the search volume index (SVI) for the market index topic "DJI" for the USA and "FTSE" for the UK and the keyword "日経"(Nikkei) for Japan. Fig.2 with blue color is the weekly SVI data for the DJI market index topic from Google Trends from January 2004 until December 2014. The volume measure is based on the number of searches which were submitted within the USA for all keywords connected with Dow Jones Index -DJI. Fig.2 with red color shows the weekly SVI data for the FTSE market index topic from Google Trends from January 2004 until December 2014. The volume measure is based on the number of searches for all keywords connected with FTSE index which are submitted within the UK. Fig.2 with green color shows the weekly SVI data for the "日経"(Nikkei) searching term from Google Trends from January 2004 until December 2014. The volume measure is based on the number of searches which were submitted within Japan for "日経" (Nikkei) keyword. The data from Google Trends are relative in nature because they do not provide effective total number of searches, but only the search volume index. The data is scaled so that the maximum of the time series is 100.

EGARCH-SVI MODEL
If pt is the closing price of the index at the end of trading day then return of the index is defined as: GARCH model was first developed by Bollerslev (1986) and extended to Exponential GARCH (EGARCH) by Nelson [8] to capture the "leverage effect" of equity returns. In this paper we consider straightforward EGARCH (1, 1) model, which is adequate for time series volatility modeling of asset returns. Equation (2) represents the conditional mean mode, each return rt consist of a conditional mean, plus an uncorrelated, white noise(εt): In equation (3) Zt is a sequence of independent and identically distributed (i.i.d.) random variables with zero mean and unit variance: Equation (4) refers to asymmetric model and represents the conditional variance model where conditional variance depends on both size and the sign of lagged residuals. EGARCH (p, q) model may be defined as a combination of Equation (3) and Equation (4). When p=1 and q=1, we have the simple EGARCH (1, 1): M a r c h 26, 2 0 1 5 We add one exogenous variable weekly SVI from Google Trends as additional information to the conditional variance. The new conditional variance of model EGARCH-SVI is defined as: The SVI represents weekly changes in Google query volumes for search terms related to the asset that we want to model but with lag 1. Google Trends provides weekly data. In the EGARCH-SVI model we add one exogenous variable weekly SVI from Google Trends as additional information to the conditional variance with lag 1 [17]. Maximum likelihood estimation is applied for estimation of the parameters of the model.

EMPIRICAL RESULTS
We test the weekly EGARCH-SVI model with the data set for Dow Jones Industrial Average (DJI) for the USA, the FTSE index for the UK, and the Nikkei for Japan. The model was implemented in R (version 3.1.0). We are testing both models, the EGARCH and the EGARCH-SVI so we can see the influence of the search engine data on the prediction of the volatility. We are interested in the prediction of the weekly volatility of the index so we are calculating the weekly return of the DJI, FTSE and Nikkei indices from January 2004 until December 2014. From the Google Trends we are taking the weekly SVI for "Dow Jones Industrial Average" market index topic for the USA, "FTSE" market index topic for the UK and "日経"(Nikkei) searching term for Japan .We are dividing the dataset in three setsbefore the crisis period, the crisis period and after the crisis period. For the period before the crisis we are taking the data sets from January 2004 until July 2007. For crisis period we are taking the data sets from August 2007 until July 2012. For after crises period we are taking data from August 2012 until December 2014. We also did robustness check in range of +/-six months for periods and we have the same results.

Before financial crises period
For the period before the crisis period we are modeling EGARCH-SVI model for DJI index. Table 1 shows optimal parameters for EGARCH-SVI model. In EGARCH-SVI model for DJI, variable svi with parametar ext_reg1 is not statistically significant and that means that it is not improving the prediction of the DJI market index volatility. For the same period we are modeling EGARCH-SVI model for FTSE index. Table 2 shows optimal parameters for EGARCH-SVI model. In EGARCH-SVI model for FTSE, variable svi with parameter ext_reg1 is not statistically significant and that means that is not improving the prediction of the market index volatility. For NIKKEI index we are also modeling EGARCH-SVI model for same period. Table 3 shows optimal parameters for EGARCH-SVI model. In EGARCH-SVI model for NIKKEI index, variable svi with parametar ext_reg1 is not statistically significant and that means that it is not improving the prediction of the NIKKEI market index volatility.

Financial crises period
For the crisis period we are modeling weekly EGARCH-SVI and EGARCH model for DJI index. From Google Trends we are using SVI for DJI market index topic for the USA. Table 4 shows optimal parameters for the EGARCH-SVI model. In our new model, coefficient of the new variable svi with parametar ext_reg1 is statistically significant with t-value =6.88040.  Table 5 shows the evaluation of the in-sample forecasting of the model with using Akaike, Bayes, Shibata and Hannan-Quinn information criteria.    Also the Bayes and the Hannan-Quinn information criteria is better (0.94%, 0.82%) for the EGARCH-SVI model. Table 9 demonstrates that the out of sample 5 the EGARCH-SVI generates better 0.17% (0.  The EGARCH-SVI model for FTSE has DM value of 6.2399 that is bigger than DM value of 6.2385 for the EGARCH model and both are statistically significant i.e. the EGARCH-SVI predicts volatility better than EGARCH. We are also interested in analyzing the weekly volatility of the NIKKEI index from Japan. We are calculating the weekly return of the NIKKEI index from January 2004 until December 2014. From the Google Trends we are taking the weekly SVI for "日経" (Nikkei) for the same period. The table 10 shows optimal parameters for the EGARCH-SVI model. In the EGARCH-SVI model the variable SVI with parameter ext_reg1 is statistically significant with the t-value =2.98830.  Table 11 and table 12 shows the evaluation of the in-sample and the out-sample forecasting of the EGARCH-SVI model and the EGARCH model.

After financial crises period
For the period after the crisis we are modeling the EGARCH-SVI model for DJI index. The table 13 shows optimal parameters for the EGARCH-SVI model.  The table 14 shows optimal parameters for the EGARCH-SVI model. In the EGARCH-SVI model for FTSE, variable svi with parametar ext_reg1 is not statistically significant and that means that it is not improving the prediction of the market index volatility.
For "NIKKEI" we are also modeling the EGARCH and the EGARCH-SVI model for same period. Table 15 shows optimal parameters for the EGARCH-SVI model. M a r c h 26, 2 0 1 5 In the EGARCH-SVI model for the NIKKEI index, the variable svi and parameter ext_reg1 is not statistically significant and that means that it is not improving the prediction of the volatility of the NIKKEI market index

CONCLUSION
In this paper, the study investigates the improvement on the financial time series volatility predictions in different market periods by using search engine data. For the empirical analysis we use the EGARCH model and the EGARCH-SVI model. For our analysis we use the weekly data from Dow Jones, FTSE and Nikkei market indices. For the EGARCH-SVI model we use the weekly search volume index from the Google Trends for the market indices keywords. W ith the empirical results we show that the SVI have different influences on the improvement of the financial time series volatility prediction in different market periods. In the period of market crisis with the bigger price volatility the SVI improves the predicting power of the EGARCH-SVI model. In the period before and after the crises SVI is not improving prediction of financial time series volatility. With this empirical study we also confirm the predictive power of the EGARCH-SVI model in crisis periods for different financial markets.