Analysing the impact of information releases’ sentiment upon the stock market
Table of Contents
Table of Contents
1.2.1 News within the Financial Sector
1.4 Thesis outline and contributions
2.1 Prediction models and News
4. Modelling Architecture and Choice of Assets
5.3.1 The measures used in the model
Abstract
This thesis examines the relationship between the verbal information reported in media articles and press releases, and also between the numerical information gathered on financial markets.
The verbal information data gathered distinguishes if the sentiment of the sources, are positive or negative news.
It is of our interest to examine if we can use this sentiment analysis to predict market reactions.
EXECUTIVE SUMMARY
Stock markets worldwide operate in a very volatile environment, as little disturbance or fluctuations can have a huge impact on a company’s economic health. The magnitude of impact can be realised by the fact that there are more than 65 listed stock exchanges around the world with a total value of $69 trillion (Jeff Desjardins, 2016), and out of which the strongest/biggest 16 alone handles the 87% of global market capitalization, thus a minute volatility can result in huge gains or loss for the investors. Stock Market prediction has been therefore a highly attractive area of subject for business and as well as the academia worldwide. Numerous attempts have been made in the past so as to learn the pattern of the stock market trading in order to assert financial gains, although none of them have been able to predict successfully with a near 100 percent accuracy. This ambiguity in the result of stock market prediction is a result of numerous factors that directly or indirectly affect the stock market trading across the world. In this report, we therefore attempt to find out the impact of social media i.e. global news and twitter data on stock markets, and will touch upon an important part of economics i.e. behavioural economics which emphasises on the profound impact of emotions/sentiments on individual behaviour and decision making while dealing with stocks. The study carried out in this context is based on US stock exchange DJIA.
Figures
Figure 1: Number of News releases per week over time period: 2000 to 2012
Tables
Table 1: Sentiment Variables derived from wordlist z, about article i, on company j on day t.
Table 2: Further Sentiment Variables derived from wordlist z, about article i, on company j on day
Table 3: Different Tone Variables derived from each wordlist z, about article i, on company j on day
Table 4: Summary of information available in the data set
Table 5: Summary of information available in the sentiment data set on GlaxoSmithKline plc.
Table 6: The number of Media articles in each different category based on the average tone value
Table 9: The quarterly reports release dates which occurred during 2011
Abbreviations and Acronyms
Although most of the abbreviations are explained when they are used for the first time in the text, we also list them here:
ADF Augmented Dickey-Fuller test
AR Autoregressive
BVAR Bayesian vector autoregressive model
CAPM Capital asset pricing model
DJIA Dow Jones Industrial average
EMH Efficient market hypothesis
ETF Exchange-traded fund
GARCH Generalised autoregressive conditional heteroskedasticity
GI General Inquirer of the Harvard Dictionary
HFT High Frequency Trading
MAE Mean absolute error
MEC Market-efficient coefficient
PIN Probability of informed trading
RMSE Root mean squared error
VAR Vector autoregressive
VIX Volatility implied index
######################################################
1. Overview
1.1 Focus of the thesis
The research depicted in this study focuses on examining financial news as an event, how the stock reacts to such events and potentially how news can be incorporated in to forecast models to intensify the model’s predictability power. The sentiment of each news source, is taken as a numerical value in order to incorporate it in to the regression and prediction models. The wordlists which have been used to generate the numerical values of sentiment, will be examined against each other, to gain a greater understanding of their analysis results. Overall the main focus will be to study how news sentiment can be applied in a financial context.
1.2 Introduction
1.2.1 News within the Financial Sector
News is shared information which is widely available and accessible to the public. In the financial sector, news concentrates on a wide set of areas, such as reporting on industries, companies, regulations and more. News generally reports on an event and is circulated as significant facts of importance to share information and enhances ones understanding about the markets. The distribution of news can result in recipients of the information, interpreting the news and therefore enabling investors to make assessments, which may bring about actions being implemented in their portfolio of assets. It is of interest to this research to examine the subsequent reaction of the market participants, due to the distribution of news and the sentiment formed from the consequent interpretation.
The actions foreshadowing the public circulation are essential in sustaining the news stories legitimacy and credibility. The process involves collecting information directly from credible sources in which allows journalists the ability to publish a written article which subsequently is processed through editorial control. The detailed analysis the editorial process provides, ensures that only the validated sources of information are included, to preserve the reader’s trust and confidence with the news. Additionally, this also indicates that potentially a filtering process occurs to determine whether a source of information is satisfactory to broadcast.
The rationale behind this is as follows: The knowledge of the information is restricted to a small group of individuals initially, whom subsequently release it to journalists and media sources, which determines whether it is a necessity for the information to be publicised. Thus, in order to differentiate between what news is acceptable for distribution, and what information should remain out of the public eye, a selection criteria is imposed. Namely, news is information which has been graded as significant for the audience of market participants such as investors, traders etc. to read. This is of interest as it suggests the question of what party is in charge of the selection process. To answer this question intuitively, would be to say journalists. However, after further reflection, it is the source who the journalist received the information from. Therefore, there must be some underlying motive which drives the source to dispense the information and require it to be distributed to the public. If there was a distribution of false or unsound information, it would have been a result of the person releasing the information, attempt to manipulate the markets in order to generate financial gain, which is an illegal act. The objective of journalists is to entice the public’s attention and capturing their interest, which may result in a prevailing opinion being formed amongst market participants. This is approximately established as sentiment.
1.2.2 The Influence of News
From the revolutionary work of Bachelier et al. (2006), the awareness that it is not possible to predict future stock prices through analysing historical stock prices, has become common knowledge. However, it is also universally accepted that a major contributor to the process in which the price of financial assets adapt, is information flow. This is particularly relevant to a stock’s trading volume and volatility, as news plays an influential role in the theoretical model of realized volatility (Andersen et al. 2003). Nonetheless, due to the Efficient Market Hypothesis (EMH), it is uncertain whether new information also assists in explaining or forecasting the stock price in the next time period. The work of Fama (1970), formulated the EMH, which suggests that the financial market has the ability to reflect any new information in an efficient manner, by providing fair market prices.
From the EMH, it is theorized that no participant in the market, can earn excess returns, as all information should be precisely reflected by the share prices. This should hold true for the reason that modern electronic markets are becoming continuously faster at interpreting and analysing new information. Due to all stock prices theoretically being claimed to incorporate all prior public information, investors and traders should not be able to initiate a trading strategy that will allow them to gain excess returns from analysing historical information. Hence, there should be no correlation between present day stock returns and the sentiment of financial articles concentrating on a particular stock, if the EMH holds true.
The work of Malkiel (2003) provides empirical evidence that in reality the EMH, does not hold true, through a survey on this topic. This evidence lead to Brown et al. (1988), developing the Uncertain Information Hypothesis (UIH). The UIH is based on the overreaction hypothesis by De Bondt and Thaler (1985), and in comparison with the EMG, it differs as the new prices are set by market participants before the full extent of the news content is determined. In the situation when there is both favourable and unfavourable news, the stock prices are set at a price which is considerably below the conditional expected values, by the investors and are therefore react risk-averse. Evidence of this overreaction hypothesis is provided by De Bondt and Thaler (1985), through the confirmation of prior stocks which are losing historically, earning approximately 25% more than stocks which are former winners. Sentiment within the news data set studied by De Bondt and Thaler (1985), is not analysed or relied on, instead their work defines a significant movement in the price of a stock as a “news event”.
The work of Zarowin (1990) provides the argument that due to size effect of the reduced market capitalization of the stocks which are labelled losers, the overreaction phenomenon is caused. It is reported in the work of Banz (1981), that firms who have smaller market capitalization, consistently throughout history, outperform stock which have a larger market capitalization. Further evidence of the UIH is researched in various other works such as Brown et al. (1988) and Yu et al. (2009).
However, this aforementioned work relies on deriving news measures directly from the reactions of stocks, specifically returns, whilst work which has specifically concentrated on news sentiment, does not re-examine the UIH. In the present day big data era, not only has the volume of news articles published, increased tremendously, but also the expanding body of literature that recognises the importance of news releases of companies, and the impact they have upon the companies’ stocks (Braun et al. 1995; Edmans et al. 2014; Alfano et al. 2015). Due to this large amount of information which is publicly available, and the existence of computers which are capable of operating with large volumes of data, it is only logical to attempt to use news and other sources to enhance our understanding of how the stock reacts. The stock reactions which we will specifically look at will be volatility, trading volume and returns.
One possible benefit of analysing this data through high powered advanced computers, is that they will provide insightful forecasts of a company’s stock which would allow investors to filter noises and implement more calculated decisions. Two major known sources which investors use to extract and analyse market information is financial news articles and press releases. The data set upon which this thesis concentrates on, contains both press releases and media articles on a range of companies, over various industries. Both the press releases and the media articles concentrating on a particular company, relates to the company’s performance and news regarding it. The sentiment of these press releases and media articles, has been analysed and computed in to numerical values which make it possible to employ it in to mathematical models.
To summarise, ordinarily, a company will release news which is relative to its’ performance to the general public. From these releases, the media will depict the information and then construct an article based on their interpretation, reporting their analysis of the information. This research is also interested in investigating if there are considerable discrepancies between the press releases and media articles which follow them, in terms of sentiment. The existence of any large variation between the sentiments of both sources, could potentially cause uncertainty to rise within the market, thus resulting in the behaviour of the realized volatility to react inconsistently. Despite the fact that earning press releases are voluntary, companies are more inclined to release information to the media and the public if they contain positive news, in an attempt to portray a prosperous quarter. The writing style implemented in to the production of each earning press release is significant, as the wording and delivery of the content could maximise the reactions of a successful quarter, whilst it could also be used to minimise the damage of a quarter where the company underperformed.
1.3 Thesis goal
As there is an ever expanding amount of information available to the public through the internet, media and news, it is of interest in this study to be utilize this wealth of information sources to improve our understanding of how stocks react to information releases. To elaborate, these reactions are the stocks returns, volatility and trading volume.
This thesis will also expand further in to the analysis of sources to distinguish if the type of source releasing the information, carries weight on the sentiment of the release. The sentiment which has been gathered from the various sources has been converted to numerical values through various wordlists, thus making it possible to incorporate it in to mathematical models such as panel regression. Through the derived sentiment, answers to the following research questions will be attempted:
- Does information releases cause reactions within the market?
- Does the disclosure of news impact the noise trader?
- Is there an asymmetric response given the classification of sentiment value in terms of positive, negative, neutral classification?
- Does the type of source releasing the information, have an effect on the sentiment of the release?
- Is there evidence which validates the UIH?
- Can forecasts be made based on the potential evidence gathered?
Question (i) aims at researching if the market reacts and how it behaves to information releases. Question (ii) attempts to investigate if the percentage of institutional owners changes due to the influence of news disclosures? Although noise traders are more likely to base their trading decisions on the news, this raises the assumption that they will be trading with institutional traders. Hence, the percentage of a stock owned by institutional owners should change when non-institutional owners are implementing their trades based on news releases. While Question (i) asks if the market reacts to news releases, Question (iii) aims at further investigation by incorporating the sentiment which is projected within the release. It will address if there is a stronger reaction when the sentiment of the release is negative. Question (iv) examines if the type of source for example, media articles and press releases, plays a role on the sentiment of the information being positive or negative. It will study if press releases are more positive in their nature when releasing information. Question (v) aims to improve on previous research. There is numerous evidence by researchers which validates the UIH, however to the author’s understanding, they have not incorporated textual news sentiment, but have derived from the stock prices, the “news events”. Question (vi) aims to investigate if there is evidence of accurate forecasts, can traders can benefit from this information.
1.4 Thesis outline
In the opening chapter, the focus of thesis has been presented in a summary style and introduced the approach of news as an event. Further work within this thesis is structured as follows. Chapter 2 will provide a discussion on previous literature of news sentiment and how it has grown over time, particularly looking at the importance of text analysis and text classification. Chapter 3 consists of the methodology undertaken for this paper. Chapter 4 includes a summary of the data set which the thesis focuses on as well as considerations to be accounted for. Outlined in Chapter 5 will be the results obtained from our methodology, accompanied with a detailed explanation. Finally Chapter 6 discusses the results achieved in chapter 5, and concludes with the main points to be considered in this paper.
######################################################
2. Literature review
Throughout the research and analysis of sentiment and news, the traditional sources of information has been newspapers such as articles from the Wall Street Journal. The aforementioned studies have investigated the relationship between the flow of news and stock reactions, however many of them do not incorporate a sentiment classification in to their research. The work of Tetlock (2007), provides evidence that negative sentiment derived from the articles of a Wall Street journal column, has explanatory influence for the Dow Jones downward trajectory. Groß-Klußmann and Hautsch (2011), focused on analysing how the market reacts to the intra-day stock specific data. The data was provided from the “Reuters News Scope Sentiment” engine which had derived the numerical sentiment values of the data. Their results endorse the hypothesis that volatility and trading volume is influenced by news, however due to their high frequency context, it is limited to a select number of assets. Wisniewski and Lambe (2013) study of the role of media during the credit crunch, used the Lexis-Nexis database as their source to collect news. From their collected information, they filtered for phrases that were notably used during the credit crisis, for example “Financial Crisis”. The evidence gathered suggests that news may influence the future movement of the market, however there is not substantial evidence that journalists repeat prior news.
Early research which focuses on the application of social media sentiment values to stock markets, tend to use message boards as their source of text, for example Yahoo! Finance. The benefits of using these as a data source, is the frequency of new messages in addition to the stocks which are focused on within the text, can be identified quickly. Das and Chen (2007), use the previously mentioned message board, Yahoo! Finance, to provide evidence that there is a positive correlation between sentiment which has been accumulated from the message board, and the return of the stocks index, the following day. However, there is no evidence which provided to prove this correlation on an individual firm level. The work of Antweiler and Frank (2004) also use Yahoo! Finance as their text source as well as another message board, Ragin Bull. Their study gathers evidence that the volume of messages, as well as their bullishness, has predictive value in regards to volatility. Zhang and Swanson (2009) analysed from message boards, the self-disclosed sentiment of messages which focus on holding a stock position, and concluded that they are not bias free as they are significantly more optimistic than neutral.
Further work which has also used message boards is the study by Sabherwal et al. (2011), who examined if the stock market was manipulated by focusing message boards on small-cap firms, which text contained “pump and dump” strategies. They were able to discover a pattern which implies there is a possibility that an online discussion has the potential to manipulate the stock prices of a small firm. Park et al. (2013) investigate the effect that stock message board have upon investors. Their results suggest that a confirmation bias exists between them, thus implying that traders prefer messages that echo their prior thinking. One notable disadvantage of including message board data, is that the history of numerous collected samples is rather limited. A possible reason for this could be that instead of displaying the whole history, Yahoo! Finance, only displays a fixed quantity of messages. This is evident in the work of Antweiler and Frank (2004) and Sabherwal et al. (2011), who are only able to incorporate a maximum time period of one year of message posting.
Bettman et al. (2010) are able to analyse a wider sample of text, which spans a time period of six years. Their sample data was filtered through by Naïve Bayes classification, for rumours involving potential takeovers and subsequently examined that at times, returns and trading volumes which were abnormal, followed the posting of these rumours. Kim and Kim (2014) is another piece of research who focuses on an evaluating six years of message postings. Unlike Sabherwal et al. (2011), their study concludes that the prior performances of stock prices influences future message board postings rather than postings impacting the stock price performance. Further work who concentrates on message board postings is Li et al. (2014), who derive news from Chinese web pages, and provide indication that the CSI index is outperformed by their “electronic-media-aware quantitative trader”.
More recently, researches have begun to test the sentiment of the social media platform Twitter in to their studies. The message postings (tweets) that are published on this platform have a restriction of only 140 characters per post, as they tend to be written via a mobile cellular device. As a result of this restriction, the usual hindrance of multiple stocks being mentioned in a data source, as well as grammar mistakes, tend not to be evident in the analysis.
Bollen et al. (2011) use a classification of various different mood states to separate the tweets, and their results indicates that the mood of the public can enhance the prediction power to forecast the daily changes in the Dow Jones index. This classification process is refined by Zhang et al. (2012), who focus on isolating keyword that tend to indicate a financial context. The work of Zhang et al. (2012) also expands their market study by including currencies and commodities. The process of classifying was extended by Si et al. (2013), who filtered Twitter posts to collect tweets on a firm specific level. Their study deduces that the accuracy of day-to-day predictions of stock is enhanced from the extracted sentiment that is topic specific. Another study who used tweets on stock level is Sprenger et al. (2014), whose work implied that the number of followers of the tweets poster, and the volume of retweets the post receives, can successfully assess the quality of the advice within the tweet. The sentiment trading model developed by Nann et al. (2013) outperforms the S&P 500, by aggregating data from sources such as traditional newspapers, stock message boards and Twitter.
In more recent studies, investment communities on social media, such as Seeking Alpha, have been analysed. This includes not only the articles published on the community, but also the comments within the article posting. An example of a study who used this type of source is Chen et al. (2014), who collected data from Seeking Alpha and used their analysis to indicate that negative sentiment for stock returns, holds forecasting value by enhancing predication models. Seeking Alpha was also used as a source by the work of Wang et al. (2014), which claims to have found that the correlation between the sentiment of Seeking alpha and returns is greater than the correlation between the sentiment of another finance blogging platform, StockTwits, and returns. NASDAQ Community is a platform which collects and accumulates investment articles from various other social media platforms and news articles, and was used by Zhang et al. (2015), in their research. Their study made a comparison of the different sentiment lexica, and found an incremental influence of the derived sentiment on a stocks volatility and returns.
Ahern and Sosyura (2014) studied the manner in which companies portray their releases of earnings concept by examining the tone of the releases. The main component of their study is the abnormal positive tone and they concluded that there is evidence of firms implementing strategic disclosures in their releases. This discrepancy in tone may be due to media articles which are published on an earnings press release, potentially providing an over or under representation of the information disclosed, or also providing a more accurate interpretation on the news released. Nonetheless, this discrepancy between the tones of the sources may result in confusion in the market, hence why they should be examined separately. This is important in regards to noise traders who could be influenced in their investment decisions by the sentiment of these sources.
########################################################
3. Sentiment Analysis
The number of positive words and negative words of each article, as well as the tone of articles, is calculated from 5 different wordlists:
- LM wordlist
- Henry 2006 wordlist
- Henry 2008 wordlist
- Diction 7 wordlist
- LIWC wordlist
In this study, sentiment is considered on a word and tone level and the corresponding variables are derived from the press releases and media articles by the above wordlists, and are displayed in Table 1, where
i, j,t, z, refers to article, company, day and wordlist, respectively.
Variable | Description |
Wi,j,t,z | Total number of words in article, i, about company, j, on day, t |
Wi,j,t,z+ | Number of Positive words in article, i, about company, j, on day, t |
Wi,j,t,z- | Number of Negative words in article, i, about company, j, on day, t |
Tonei,j,t,z | Tone of article, i, about company, j, on day, t |
Table 1: Sentiment Variables derived from wordlist z, about article i, on company j on day t.
Furthermore, the variables are formatted to obtain a signal on a day level, rather than on an article level. This is similar to the work of Chen et al. (2014) who calculate the proportion of negative words and positive words, to the total number of words published for each day. Equation 1 below displays the fraction of positive words about company
j, on day
t,from wordlist
z.
Pj,t,zW=∑i=1ItWi,j,t,z+∑i=1It1Wi,j,t,zPol>0Wi,j,t,z
Equation 1
Where,
Wi,j,t,zPol=Wi,j,t,z++Wi,j,t,z-
Equation 2
Wi,j,t,zPol
is the number of words with positive polarity and is displayed, amongst other variables in Table 2, and is calculated from the number of positive and negative words in each article. Furthermore, equation 3 calculates the fraction of negative words.
Nj,tW=∑i=1ItWi,j,t-∑i=1It1Wi,j,tPol>0Wi,j,t
Equation 3
Variable | Description |
Wi,j,t,zPol | number of words with positive polarity |
Pj,t,zW | Fraction of positive words about company j, on day,t from wordlist z |
Nj,t,zW | Fraction of negative words about company j, on day,t from wordlist z |
It | Total number of articles on day, t |
Table 2: Further Sentiment Variables derived from wordlist z, about article i, on company j on day
The indicator
Indicatorj,t, is established to signal that there is at least one article published about company
j.
Indj,t=1, if It>00, otherwise
Equation 4
The tone of each article is calculated by subtracting the number of positive words by the number of negative words, and dividing that by the sum total of positive and negative words.
Tonei,j,t,z=Wi,j,t,z+-Wi,j,t,z-Wi,j,t,z++Wi,j,t,z-
Equation 5
Where the daily tone will be calculated from the following equation:
Tonet,z=∑i=1ItWi,j,t,z+-Wi,j,t,z-Wi,j,t,z++Wi,j,t,z-
The study also is interested to investigate if there is a correlation throughout the tone provided by the wordlists. To do so, an average tone overall is calculated adding together the tone of each individual wordlist and dividing by the number of wordlists.
Tonei,j,z,AVG=Tonei,j,z,LM+Tonei,j,z,H6+Tonei,j,z,H8+Tonei,j,z,D7+Tonei,j,z,LIWC5
Equation 6
Variable | Description |
Tonei,j,z,LM | Tone of LM wordlist |
Tonei,j,z,H6 | Tone of Henry 2006 wordlist |
Tonei,j,z,H8 | Tone of Henry 2008 wordlist |
Tonei,j,z,D7
Tonei,j,z,LIWC Tonei,j,z,AVG |
Tone of Diction 7 wordlist
Tone of LIWC wordlist Average Tone |
Table 3: Different Tone Variables derived from each wordlist z, about article i, on company j on day
The articles are then allocated in to various groups based on their average tone. The distinction for the three groups can be found below:
- Positive: When the tone is highly positive i.e. greater than 0.5.
- Neutral: When the tone is moderately positive/neutral i.e. between 0.1 and 0.5.
- Negative: When the tone is negative i.e. less than 0.1.
The study of the correlation between the various wordlists tones, is undertaken to investigate if they are consistent in classifying a sources sentiment.
########################################################
POSSIBLE SENTIMENT MEASURES WHICH COULD BE INCORPORATED
EQUATION
Bullj, tW=log2-1log1+Pj,tW1+Nj,tW
Bullj, tW=defines the work based measure of bullishness for company j on day t
Pj,tW=Fraction of positive words about company j, on day,t
Nj,tW=Fraction of negative words about company j, on day,t
EQUATION
Bullj, tW**=logWj,tBullj, tW
Bullj, tW**=modified bullish measure for company j on day t
Bullj, tW=defines the work based measure of bullishness for company j on day t
EQUATION
Bearj, tW=log2-1log1+Nj,tW1+Pj,tW
Bullj, tW=defines the work based measure of bullishness for company j on day t
Pj,tW=Fraction of positive words about company j, on day,t
Nj,tW=Fraction of negative words about company j, on day,t
EQUATION
Bearj, tW**=logWj,tBearj, tW
Bearj, tW**=modified bearish measure for company j on day t
Bearj, tW=defines the work based measure of bearishness for company j on day t
EQUATION
Neg(x)=x, if x>00, otherwise
########################################################
4. Data
This chapter introduces the primary sources of data which will be implemented into the empirical analysis.
4.1 Financial Press Releases and Media Articles
The data set gathered, contains a wide range of information over a certain time period, and is initially separated in to two different categories, news releases and stock data. The overall sentiment data set consists of a collection of press releases and media articles, 36,922 in total, relevant to 44 different companies. This information is displayed in the table below:
Table 4: Summary of information available in the data set
ADD THE DATE LABEL TO THE X AXIS
Figure 1: Number of News releases per week over time period: 2000 to 2012
Figure 1 shows the number of news releases per week over time. From this figure, two notable characteristics can be observed: the number of releases published per week has increased over time, and there are some weeks which experienced only a small number of releases. The rationale behind the rise in the number of releases, could either be that there are more contributors, or that the existing contributors are disclosing releases more frequently.
Further analysis could examine if the days of the week where the stock market was closed, has an effect of the volumes of releases being published.
4.1.1 Preliminary Study of a Sentiment Source
In an attempt to gain a greater understanding of the information available, a company was chosen and the information that was available on it within the dataset, was analysed. The chosen company was GlaxoSmithKline plc. This is a British pharmaceutical company headquartered in Brentford, London, whose stock is floated on the London Stock Exchange. In the data set, there is a total number of 1568 media articles and press releases, concentrating on GlaxoSmithKline plc. A breakdown of the sources is displayed in the table blow. The press releases were also split up in to two groups, which can be seen in Table 5, voluntary press releases and regulatory press releases, in order to gain a better understanding if there is a difference in terms of sentiment.
Table 5: Summary of information available in the sentiment data set on GlaxoSmithKline plc.
ADD THE DATE LABEL TO THE X AXIS
Figure 2: Daily correlation of negative sentiment between the different wordlists, over the time period 01/01/2000 to 31/12/2011
From the above figure, it is evident that at times a negative correlation exists between the LM wordlist tone and the Henry 2008 word list tone, whilst the tones of wordlists Diction7, LM and LIWC were positively correlated with each at various points in time. Due to different values being allocated to different words within the wordlist, a discrepancy can occur. As there is a discrepancy occurring, the regression analysis models covered later in this study, will model the sentiment derived from each wordlist separately. However, the average tone was implemented in this preliminary analysis as the sentiment value in order to ensure each tone calculated from the wordlists was taken in to account. The Press Releases and Media articles were separated in to different groups, as mentioned in Chapter 3, based on their average tone sentiment. This is displayed in the tables below:
Table 6: The number of Media articles in each different category based on the average tone value
Table 7: The number of Voluntary Press Releases in each different category based on the average tone value
Table 8: The number of Regulatory Press Releases in each different category based on the average tone value
From the data gathered, a positive earnings press release generally has a highly positive writing style. A positive writing style implies far more positive words compared to negative words. On average, a positive voluntary press release has a tone of 0.67, whilst a negative earnings regulatory press release has on average a tone of -0.101281733. It is evident that companies release news in such a manner that they attempt to over-represent a successful quarter and under-represent a less successful quarter.
The example above of GlaxoKlineSmith’s information demonstrates that the tone of voluntary press releases is written differently in contrast to the tone of regulatory press releases. Therefore they should be studied separately in this report. This echoes the work of Ahern and Sosyura (2014) who studied the manner in which companies portray their releases of earnings, and noted that there tends to be a difference in tone of press releases and media articles.
ADD THE DATE LABEL TO THE X AXIS
Figure 3: The Tone of Press Releases and Media Articles Versus Time, over the time period 01/01/2000 to 31/12/2012
Figure 3 above displays the Tone and the number of press releases and media articles over the time period 01/01/2000 to 31/12/2012, about GlaxoKlineSmith plc. As mentioned in Tables 4 – 8, there is a combined total of 1568 sources published. It can be seen that the number of press releases and media articles published over the time period are not constant throughout. It is also evident that the number of media articles containing negative sentiment is greater than the number of press releases containing negative sentiment. Furthermore, it is also noteworthy that the number of press releases containing positive sentiment has increased over time, which could be something of interest.
ADD THE DATE LABEL TO THE X AXIS
Figure 4: The Tone of Press Releases and Media Articles Versus Time, over the time period 01/01/2011 to 31/12/2011
The dataset was narrowed to the time period of 01/01/2011 to 31/12/2011 to create the above figure. From analysing Figure 3, it is evident that the timing of the published sources are more clustered together, which could be in time with significant events such as quarterly earnings reports. The dates of the quarterly announcements are provided in the table below:
Table 9: The quarterly reports release dates which occurred during 2011
As previously mentioned earlier, it is important that regulatory and voluntary press releases are studied separately. This is due to companies tending to disclose information strategically and in doing so there tends to be different writing styles as well as tone between the releases. Figure 5 shows that the range of the positive tones of Voluntary press releases is much higher than the range of regulatory ones
ADD THE DATE LABEL TO THE X AXIS
Figure 5: The Tone of Voluntary Press Releases and Regulatory Press Releases Versus Time, over the time period 01/01/2011 to 31/12/2011
Table 10: Summary of the Average Tone from the wordlists of Regulatory Press Releases during the time period 01/01/2011 to 31/12/2011
Table 11: Summary of the Average Tone from the wordlists of Voluntary Press Releases during the time period 01/01/2011 to 31/12/2011
The tone of voluntary press releases had a mean value of 0.2394, whilst the regulatory press releases tone had a mean value of 0.09771. This provides further indication that the tone of Regulatory and voluntary press releases should be examined separately.
As news releases are most likely to influence the ordinary noise trader (Green 2004), it is of interest to this study to examine if there exists a correlation between the percentage of institutional owners and the time of news releases, and if this has an impact on the stock’s share price. If in the event that a news release is published containing negative sentiment, it is most likely to inspire the noise trader to sell their stocks (Fostel and Geanakoplos 2012), which would potentially be bought by institutional owners .
NEEDS FUTHER ANALYSIS– charts, data collection
################################
4.2 Financial Data
In this chapter, we have considered 44 stocks traded for the empirical study. These have been taken from the London Stock Exchange (FTSE), and belong to 8 industry sectors. The objective is to determine whether taking news flow and the sentiment of the news in to consideration and combining it with market data, results in a more accurate forecast of asset behaviour. We study the behaviour of the stock’s share price, volatility, percentage of institutional owners and trading volume, with the ultimate aim to construct models which can predict asset behaviour in high frequencies. The relevance of this framework is highly applicable to the trading sector, as it supports decision-making models, in the form of strategies, which chooses trading portfolios containing numerous assets.
Table 12: Summary of the chosen assets and their respective industry sectors
DRAFT TABLE – HAVE TO FIX COMPANY NAMES
Table 13: Summary of the volume of news releases to the respective chosen assets and the time period the releases range from
Stock specific data is collecting through the financial software Bloomberg, where daily prices, trading volume, and percentage of intuitional owners, is collected for each respective company. Furthermore, the number of shares traded on a day, is defined as the daily trading volume for a particular stock. The following equations and tables, will list the variables which will be examined and considered as stock reactions.
Variable | Description |
σj,t | Range-based measure of volatility for company j, on day t |
Pj,tH | Daily Highest stock prices |
Pj,tL | Daily Lowest stock prices |
Pj,tO | Daily Opening stock prices |
Pj,tC | Daily Closing stock prices |
Vj,t* | Raw dailylogtrading volume |
Vj,t | Detrendedlogtrading volume |
Rj,t | Daily Returns |
%IOj,t | Percentage of Institutonal Owners |
Table 14: Variables collected or calculated from Bloomberg on company j, on day t.
The study is using volatility as a measurement which captures the variability of the company’s stock price over a time period of one day, due to this being the selected observation period.
- Such a measure, the realized volatility, can be obtained by using high-frequency intra-day returns.
- Garman and Klass (1980) show that this estimator might be improved by using high-low data and define the range-based measure of volatility for company j on day t as
σj,t=0.511u-d2-0.019cu+d-2ud-0.838c2
Equation 7
Where,
u=logPj,tH-log(Pj,tL)
d=logPj,tL-log(Pj,tO)
c=logPj,tC-log(Pj,tO)
- It is shown by Chen et al. (2006) and Shu and Zhang (2006) that the Garman and Klass range-based measure of volatility provides equivalent results to the realized volatility on daily level.
- Subsequently, the Garman and Klass range-based measure of volatility is used in the further analysis
- Following Girard and Biswas (2007), the detrended log trading volume for each stock is estimated by using a quadratic time trend equation:
Vj,t*=α+β1t+β2t2+Vj,t
Equation 8
- where V ∗ i,t corresponds to the raw daily log trading volume and the detrended log trading
- volume Vi,t are the residuals.
- A look-ahead bias is avoided by using a rolling window of 120 observations and estimating a one-step ahead pseudo out-of-sample forecast.
- Furthermore,
- Furthermore, the returns are calculated as Rj,t
- Rj,t=logPj,tC-log(Pj,t-1C)
Equation 9
################################
EQUATION 12
σj,t=αj+β1TSentj,t+β2TXj,t+γj+εj,t
σj,t=the range-based measure of volatility for company j on day t
#########################################
Sentj,t=different versions
of sentiment are considered, depending on the set of sentiment measures
Model 1 uses:
Indj,t,
Pj,tW, and Nj,tW
Indj,t=indicator whether there is at least one article about company j on day t
Pj,tW=Fraction of positive words about company j, on day,t
Nj,tW=Fraction of negative words about company j, on day,t
#########################################
γj=the fixed effect for firm j
∑jγj=0
εj,t=error term of company j at day t
#########################################
Xi,t=Control Variables
- RM,t=SP 500 index return
- VIXt=CBOE VIX index on date t to measure the generalized risk aversion
Vj,t=αj+β1TSentj,t+β2TXj,t+γj+εj,t
Rj,t=αj+β1TSentj,t+β2TXj,t+γj+εj,t
################################
4.3 The Chosen Data set
From the FTSE, a selection of 44 assets across 8 different sectors have been chosen for this empirical study.
The various sectors are listed below in TABLE XXXXXX, where the number of companies chosen from each respective sector are also stated.
Table XXXXX provides a summary of the selected assets statistics.
To guarantee a sufficient number of data points within the time series of news data source, companies were chosen which have large market capitalisation and thus are widely covered in the news.
- In addition, the total market capitalisation of all selected assets is $4.36trillion (US stocks: $3.34trillion, UK stocks: $1.01trillion).
- This is a good representation of the entire stock market in the US and the UK as a large proportion of the market has been taken into account.
5. Results of
5.1 XXXXXXXXXX
5.1.1 XXXXXXXXXX
5.2 XXXXXXXXXX
5.3 XXXXXXXXXX
5.4 XXXXXXXXXX
5.5 XXXXXXXXXX
6. Discussion
7. Conclusion
Appendix
Bibliography
Ahern, K.R., Sosyura, D. (2014) ‘Who Writes the News? Corporate Press Releases during Merger Negotiations’, Journal of Finance, 69(1), 241–291, available: http://doi.wiley.com/10.1111/jofi.12109 [accessed 24 Jul 2017].
Alfano, S.J., Feuerriegel, S., Neumann, D. (2015) ‘Is News Sentiment More Than Just Noise?’, Ecis, (2015), 0–16, available: http://aisel.aisnet.org/ecis2015_cr/5.
Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P. (2003) ‘Modeling and Forecasting Realized Volatility’, Econometrica, 71(2), 579–625, available: http://links.jstor.org/sici?sici=0012-9682%28200303%2971%3A2%3C579%3AMAFRV%3E2.0.CO%3B2-V [accessed 23 Jul 2017].
Antweiler, W., Frank, M.Z. (2004) ‘Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards’, The Journal of Finance, 59(3), 1259–1294, available: http://doi.wiley.com/10.1111/j.1540-6261.2004.00662.x [accessed 4 Aug 2017].
Bachelier, L., Davis, M., Etheridge, A. (2006) Louis Bachelier’s Theory of Speculation : The Origins of Modern Finance [online], Princeton University Press, available: https://www.jstor.org/stable/j.ctt7scn4 [accessed 23 Jul 2017].
Banz, R.W. (1981) ‘The relationship between return and market value of common stocks’, Journal of Financial Economics, 9(1), 3–18, available: http://linkinghub.elsevier.com/retrieve/pii/0304405X81900180 [accessed 23 Jul 2017].
Bettman, J.L., Hallett, A.G., Sault, S. (2010) ‘Rumortrage: Can Investors Profit on Takeover Rumors on Internet Stock Message Boards?’, SSRN Electronic Journal, available: http://www.ssrn.com/abstract=1654142 [accessed 4 Aug 2017].
Bollen, J., Mao, H., Zeng, X.-J. (2011) Twitter Mood Predicts The Stock Market | MIT Technology Review [online], Journal of Computational Science, available: https://arxiv.org/pdf/1010.3003.pdf [accessed 5 Aug 2017].
De Bondt, W., Thaler, R. (1985) ‘Does the stock market overreact?’, The Journal of Finance, 40(3), 793–805, available: http://www.jstor.org/stable/2327804 [accessed 23 Jul 2017].
Braun, P.A., Nelson, D.B., Sunier, A.M. (1995) ‘Good News, Bad News, Volatility, and Betas’, Source: The Journal of Finance THE JOURNAL OF FINANCE, 50(5), 1575–1603, available: http://www.jstor.org/stable/2329327 [accessed 23 Jul 2017].
Brown, K.C., Harlow, W.V., Tinic, S.M. (1988) ‘Risk aversion, uncertain information, and market efficiency’, Journal of Financial Economics, 22(2), 355–385, available: http://linkinghub.elsevier.com/retrieve/pii/0304405X8890075X [accessed 23 Jul 2017].
Chen, H., De, P., Hu, Y., Hwang, B.-H., Faccio, M., Kim, S., Liberti, J., Lou, D., Loughran, T., Polak, I., Xu, J. (2014) ‘Wisdom of Crowds: The Value of Stock Opinions Trandmitted Through SocialL MEDIA’, Review of Financial Studies, 27(5), 1367–1403, available: http://www.bhwang.com/pdf/5_wisdom-of-crowds.pdf [accessed 5 Aug 2017].
Das, S.R., Chen, M.Y. (2007) ‘Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web’, Management Science, 53(9), available: https://doi.org/10.1287/mnsc.1070.0704 [accessed 4 Aug 2017].
Edmans, A., Goncalves-Pinto, L., Wang, Y., Xu, M. (2014) ‘Strategic news releases in equity vesting months’, 53(9), 1689–1699.
Fama, E.F. (1970) ‘Efficient Capital Markets: A review of Theory and Empirical Work’, The Journal of Finance1, 25(2), 383–417, available: http://www.jstor.org/stable/2325486 [accessed 23 Jul 2017].
Fostel, A., Geanakoplos, J. (2012) ‘Why does bad news increase volatility and decrease leverage?’, Journal of Economic Theory, 147(2), 501–525, available: http://ac.els-cdn.com/S002205311100113X/1-s2.0-S002205311100113X-main.pdf?_tid=3af934de-6f33-11e7-93e6-00000aab0f6c&acdnat=1500765328_f1cdb98a7f449026e0b39ad51fcaed9f [accessed 23 Jul 2017].
Green, T.C. (2004) ‘Economic News and the Impact of Trading on Bond Prices’, The Journal of Finance, 59(3), 1201–1233, available: http://doi.wiley.com/10.1111/j.1540-6261.2004.00660.x [accessed 6 Aug 2017].
Groß-Klußmann, A., Hautsch, N. (2011) ‘When machines read the news: Using automated text analytics to quantify high frequency news-implied market reactions’, available: http://ac.els-cdn.com/S0927539810000873/1-s2.0-S0927539810000873-main.pdf?_tid=590bc5c8-7900-11e7-b13e-00000aacb361&acdnat=1501842986_e85e591558c941442f384c52244911fa [accessed 4 Aug 2017].
Kim, S.-H., Kim, D. (2014) ‘Investor sentiment from internet message postings and the predictability of stock returns’, Journal of Economic Behavior & Organization, 107, 708–729, available: http://linkinghub.elsevier.com/retrieve/pii/S0167268114001206 [accessed 4 Aug 2017].
Li, Q., Wang, T., Li, P., Liu, L., Gong, Q., Chen, Y. (2014) ‘The effect of news and public mood on stock movements’, Information Sciences, 278, 826–840, available: http://ac.els-cdn.com/S0020025514003879/1-s2.0-S0020025514003879-main.pdf?_tid=d6e2ebd4-7982-11e7-819f-00000aab0f6c&acdnat=1501899032_02ba1ce0a5d0ddb027ea18180c4daea2 [accessed 5 Aug 2017].
Malkiel, B.G. (2003) ‘The Efficient Market Hypothesis and Its Critics’, Journal of Economic Perspectives—Volume, 17(1—Winter), 59–82, available: http://pubs.aeaweb.org/doi/pdfplus/10.1257/089533003321164958 [accessed 23 Jul 2017].
Nann, S., Krauss, J., Schoder, D. (2013) ‘Predictive Analytics On Public Data – The Case Of Stock Markets’, in Proceedings of the 21st European Conference on Information Systems, 1–12, available: http://aisel.aisnet.org/ecis2013_cr [accessed 5 Aug 2017].
Park, J., Konana, P., Gu, B., Kumar, A., Raghunathan, R. (2013) ‘Information Valuation and Confirmation Bias in Virtual Communities: Evidence from Stock Message Boards’, Information Systems Research, 24(4), 1050–1067, available: http://pubsonline.informs.org/doi/abs/10.1287/isre.2013.0492 [accessed 4 Aug 2017].
Sabherwal, S., Sarkar, S.K., Zhang, Y. (2011) ‘Do Internet Stock Message Boards Influence Trading? Evidence from Heavily Discussed Stocks with No Fundamental News’, Journal of Business Finance & Accounting, 38(9–10), 1209–1237, available: http://doi.wiley.com/10.1111/j.1468-5957.2011.02258.x [accessed 4 Aug 2017].
Si, J., Mukherjee, A., Liu, B., Li, Q., Li, H., Deng, X. (2013) ‘Exploiting Topic based Twitter Sentiment for Stock Prediction’, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), (2011), 24–29, available: https://www.cs.uic.edu/~liub/publications/ACL-2013-Jianfeng-stock-short.pdf [accessed 5 Aug 2017].
Sprenger, T.O., Tumasjan, A., Sandner, P.G., Welpe, I.M. (2014) ‘Tweets and Trades: the Information Content of Stock Microblogs’, European Financial Management, 20(5), 926–957, available: http://doi.wiley.com/10.1111/j.1468-036X.2013.12007.x [accessed 5 Aug 2017].
Tetlock, P.C. (2007) ‘Giving Content to Investor Sentiment : The Role of Media in the Stock Market’, The Journal of Finance, 62(3), 1139–1168.
Wang, G., Wang, T., Wang, B., Sambasivan, D., Zhang, Z., Zheng, H., Zhao, B.Y. (2014) ‘Crowds on Wall Street: Extracting Value from Social Investing Platforms’, available: http://arxiv.org/abs/1406.1137 [accessed 5 Aug 2017].
Wisniewski, T.P., Lambe, B. (2013) ‘The role of media in the credit crunch: The case of the banking sector’, Journal of Economic Behavior and Organization, 85(1), 163–175, available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.469.6563&rep=rep1&type=pdf [accessed 4 Aug 2017].
Yu, S., Rentzler, J., Tandon, K. (2009) ‘Reexamining the uncertain information hypothesis on the S&P 500 Index and SPDRs’, available: https://link.springer.com/content/pdf/10.1007%2Fs11156-009-0119-x.pdf [accessed 23 Jul 2017].
Zarowin, P. (1990) ‘Size, Seasonality, and Stock Market Overreaction’, Journal of Financial and Quantitative Analysis Mar, 25(1), available: http://people.stern.nyu.edu/pzarowin/publications/P07_Size_Seasonality_1.pdf [accessed 23 Jul 2017].
Zhang, J.L., Härdle, W.K., Chen, C.Y., Bommes, E., Ardle, W.K. (2015) ‘Distillation of News Flow Into Analysis of Stock Reactions’, Journal of Business & Economic Statistics, 34(4), 547–563, available: http://sfb649.wiwi.hu-berlin.de [accessed 24 Jul 2017].
Zhang, X., Fuehres, H., Gloor, P.A. (2012) ‘Predicting asset value through twitter buzz’, in Advances in Intelligent and Soft Computing, 23–34, available: http://www.ickn.org/documents/Collin2011_Zhang_Fuehres_Gloor.pdf [accessed 5 Aug 2017].
Zhang, Y., Swanson, P.E. (2009) ‘Are day traders bias free?-evidence from internet stock message boards’, Journal of Economics and Finance, 34(1), 96–112, available: https://link.springer.com/content/pdf/10.1007%2Fs12197-008-9063-1.pdf [accessed 4 Aug 2017].