Responsive image
博碩士論文 etd-0702120-225635 詳細資訊
Title page for etd-0702120-225635
論文名稱
Title
媒體資料對波動市場的預測性- 結合機械學習的應用
Predicting VIX Futures Returns Using Press Coverage: Machine Learning Applications
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
58
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2020-07-10
繳交日期
Date of Submission
2020-08-02
關鍵字
Keywords
媒體情緒、機械學習、文字探勘、波動率指數期貨
VIX-futures, Machine-learning,, Text-mining, Media sentiment
統計
Statistics
本論文已被瀏覽 5715 次,被下載 0
The thesis/dissertation has been browsed 5715 times, has been downloaded 0 times.
中文摘要
本研究主要目的在應用不同類型的文字探勘方法,是否能應用在媒體新聞資料上,並且從中提取有效的市場情緒因子,應用在波動指數VIX期貨的報酬預測上,其中使用的媒體資料是來自Webhose.io的網路媒體資料,期間為2015-2018年,共計四年期間。
比較了包括提取文字、文章情緒以及文章主題的特徵,發現皆有顯著的預測性,而且彼此間也有連結,具有相同的解釋性。其中,最主要的發現為情緒指標與VIX期貨報酬呈現負相關,尤其對於負面情緒表強烈的文章,影響更鉅;另外,也發現當文章主題與美國市場指數相關的報導,也會增加VIX期貨報酬。
後續亦利用前面三種方法生成的因子,結合機械學習模型XGBoost的訓練,且比較不同因子間的表現差別,測試樣本期間為兩年。首先與VIX期貨相關的資訊影響最大,而媒體資料因子的表現不穩定,可能是受限於樣本期間過短。而在媒體資料因子中,發現文字的特徵最為穩定,儘管在2018年經歷市場的巨變,依然營利,意味著此方法在投資實務上,有能夠參考投資價值,同時因為波動期貨的特性,代表對於波動的預測。
Abstract
The purpose of this paper is to compare different methods to extract useful information from media data. The data is from a web data company Webhose.io, providing text data from 2015 to 2018. With the data, we also apply it to predict the return of VIX futures.
In this We apply three methods to extract media data features, including keyword, sentiment, and topic. And we prove that the features have predictive power to the VIX futures return. One observation is that the sentiment is negatively related to the VIX futures return, especially the extremely negative articles. The other observation is that the articles related to market index show stronger positive relations.
Besides, we build the XGBoost model with the features of media data to evaluate the performance of trading strategy. The performance we evaluate is in the year of 2017 and in the year of 2018. The most important and stable features are the market data including lag return and volume of VIX futures. On the other hand, the keywords features are stable among features of media data. It profits in the dramatic change market in February. The result shows that the media data can provide the signal for the investor to make decisions.
目次 Table of Contents
論文審定書 i
摘要 ii
ABSTRACT iii
1. Introduction 1
2. Literature Review 3
2.1 Text-mining 3
2.2 Sentiment Analysis 5
3. Methodology 8
3.1 Sample Data 8
3.1.1 Market Data 8
3.1.2 Media Data 9
3.2 Keyword Construction 10
3.2.1 Bag of Words 10
3.2.2 N-gram 10
3.2.3 TF-IDF 11
3.3 Sentiment Construction 12
3.3.1 Harvard IV dictionary 12
3.3.2 Valence Aware Dictionary for sEntiment Reasoning (Vader) 13
3.4 Topic extraction 14
3.4.1 Part of Speech (POS) 14
3.4.2 Latent Dirichlet Allocation (LDA) 15
4. Empirical Analysis 16
4.1 Keyword Result 16
4.2 Sentiment and Market Movements 18
4.2.1 The Prediction Power of Daily Sentiment 18
4.2.2 The Prediction Power of Positive / Negative Sentiment 19
4.2.3 The Prediction Power of Sentiment Structure 20
4.3 Topic Result 22
5. Strategy with Machine Learning Application 24

5.2 Model Construction 24
5.3 Bagging Method 26
5.4 Trading Strategy 26
5.5 Strategy Performance 27
5.6 The Interpretation by Machine Learning Model 28
6. Conclusion 30
Reference 32
Appendix 36

List of Figures
Figure 1 Performance of Keyword Models 38
Figure 2 Performance of Sentiment Models with Topic 4 38
Figure 3 Performance of Sentiment models with all Topics 39
Figure 4 SHAP Result for Sentiment Models with All Topics Variables 40

List of Tables
Table 1 The Description of Statistics 41
Table 2 The Predictive Power of Keywords 42
Table 3 The Relation between Sentiment and VIX Futures Return 43
Table 4 The Positive/Negative Sentiment Effect 44
Table 5 The Predictive Power of the Sentiment Distribution 45Table 6 Topic Extraction Result (LDA) 48
Table 7 The Predictive Power of Topic Ratio 48
Table 8 Strategy Performance Comparison 49
參考文獻 References
Akhtar, M. S., Kumar, A., Ghosal, D., Ekbal, A., & Bhattacharyya, P. (2017, September). A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 540-546).
Akhtar, S., Faff, R., Oliver, B., & Subrahmanyam, A. (2013). Reprint of: Stock salience and the asymmetric market effect of consumer sentiment news. Journal of Banking & Finance, 37(11), 4488-4500.
Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259-1294.
Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Lrec (Vol. 10, No. 2010, pp. 2200-2204).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
Da, Z., Engelberg, J., & Gao, P. (2015). The sum of all FEARS investor sentiment and asset prices. The Review of Financial Studies, 28(1), 1-32.
Denecke, K. (2008, April). Using sentiwordnet for multilingual sentiment analysis. In 2008 IEEE 24th international conference on data engineering workshop (pp. 507-512). IEEE.
Engelberg, J. E., & Parsons, C. A. (2011). The causal impact of media in financial markets. The Journal of Finance, 66(1), 67-97.
Gimpel, K., Schneider, N., O'Connor, B., Das, D., Mills, D., Eisenstein, J., ... & Smith, N. A. (2010). Part-of-speech tagging for twitter: Annotation, features, and experiments. Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science.
Groß-Klußmann, A., & Hautsch, N. (2011). When machines read the news: Using automated text analytics to quantify high frequency news-implied market reactions. Journal of Empirical Finance, 18(2), 321-340.
Guo, K., Sun, Y., & Qian, X. (2017). Can investor sentiment be used to predict the stock price? Dynamic analysis based on China stock market. Physica A: Statistical Mechanics and its Applications, 469, 390-396.
Hutto, C. J., & Gilbert, E. (2014, May). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth international AAAI conference on weblogs and social media.
Kearney, C., & Liu, S. (2014). Textual sentiment in finance: A survey of methods and models. International Review of Financial Analysis, 33, 171-185.
Khedr, A. E., & Yaseen, N. (2017). Predicting stock market behavior using data mining technique and news sentiment analysis. International Journal of Intelligent Systems and Applications, 9(7), 22.
Li, X., Xie, H., Chen, L., Wang, J., & Deng, X. (2014). News impact on stock price return via sentiment analysis. Knowledge-Based Systems, 69, 14-23.
Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in neural information processing systems (pp. 4765-4774).
Mahajan, A., Dey, L., & Haque, S. M. (2008, December). Mining financial news for major events and their impacts on the market. In 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (Vol. 1, pp. 423-426). IEEE.
Munková, D., Munk, M., & Vozár, M. (2013, September). Influence of stop-words removal on sequence patterns identification within comparable corpora. In International Conference on ICT Innovations (pp. 67-76). Springer, Heidelberg.
Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(16), 7653-7670.
Nguyen, T. H., Shirai, K., & Velcin, J. (2015). Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications, 42(24), 9603-9611.
See-To, E. W., & Yang, Y. (2017). Market sentiment dispersion and its effects on stock return and volatility. Electronic Markets, 27(3), 283-296.
Smales, L. A. (2014). News sentiment and the investor fear gauge. Finance Research Letters, 11(2), 122-130.
Tetlock, P. C., Saar‐Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying language to measure firms' fundamentals. The Journal of Finance, 63(3), 1437-1467.
Zhang, L. (2013). Sentiment analysis on Twitter with stock price and significant keyword correlation (Doctoral dissertation).
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2025-08-02
校外 Off-campus:開放下載的時間 available 2025-08-02

您的 IP(校外) 位址是 3.137.192.3
現在時間是 2024-04-25
論文校外開放下載的時間是 2025-08-02

Your IP address is 3.137.192.3
The current date is 2024-04-25
This thesis will be available to you on 2025-08-02.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 2025-08-02

QR Code