Responsive image
博碩士論文 etd-0519119-145602 詳細資訊
Title page for etd-0519119-145602
論文名稱
Title
新聞文本情緒分類之實證研究-以鉅亨網新聞為例
Empirical Study of News Sentiment Classification: Evidence from Anue Financial News
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
52
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2019-06-18
繳交日期
Date of Submission
2019-06-19
關鍵字
Keywords
文字探勘、機器學習、新聞情緒、文本分類、自然語言處理
Natural Language Processing, Text Mining, Text Classification, New Sentiment, Machine Learning
統計
Statistics
本論文已被瀏覽 5719 次,被下載 0
The thesis/dissertation has been browsed 5719 times, has been downloaded 0 times.
中文摘要
  現今網路新聞媒體崛起,網路新聞成為投資人執行投資決策的判斷依據之一,然而財經新聞網站每天所產生的大量新聞資訊,投資人已經無法利用傳統人力閱讀篩選的方式,逐一判斷每篇新聞所反映目前市場的情緒。
  因此本研究為協助投資人快速了解目前市場情緒,以財經新聞媒體Anue鉅亨網的台灣股市新聞進行中文新聞文本情緒分類,使用文字探勘與文本分類的技術,探討中文文本的資料前處理與文本分類模型的成效,觀察不同的特徵提取與挑選方法搭配不同分類模型的效果為何。其中字詞特徵提取與挑選使用的方法為N-gram、TF-IDF、卡方與互資訊,而分類模型使用樸素貝葉斯分類器、fastText與多層感知器。
  最後研究結果指出:(1).N-gram字詞特徵提取在三種分類模型中皆能提升準確率,特別是應用在樸素貝葉斯分類器,可以有效地改善模型假設字詞特徵獨立的缺點。(2).TF-IDF字詞特徵挑選僅對樸素貝葉斯分類器有效,可以在減少字詞數量的情況下,同時降低訓練時間與提升準確度。(3).卡方檢定與互資訊特徵挑選皆可以提升fastText與多層感知器的準確率,且卡方檢定特徵挑選與fastText分類模型取得本研究最優異的分類效果。
Abstract
  Nowadays, online news has become one of the judgments for investors to make investment decisions. However, a large amount of information generated by financial news websites everyday makes investors unable to use traditional human reading and screening methods to judge and verify the current market sentiment reflected by each news report.
  In order to help investors understand the current market sentiment quickly, we use the techniques of text mining and text classification to classify new sentiments. This study collects Taiwanese stock market news of Anue Financial News and use different methods of text pre-processing and classifier to achieve the best classification performance.
  The empirical results show:(1) N-gram feature extraction can improve the accuracy of all classifiers, especially the naive Bayes classifier which can effectively overcome shortcomings of the independence assumptions. (2) TF-IDF feature selection only effective for naive Bayes classifier. Under the circumstances of the number of words decreasing, it can improve the accuracy and reduce the training time. (3) The Chi-square test and mutual information feature selection can improve the accuracy of both fastText and Multi-layer Perceptron. Furthermore, the combination of Chi-square test feature and fastText achieved the best performance in this study.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
目錄 iv
圖次 vi
表次 vii
第一章 緒論 1
第一節 研究動機與目的 1
第二節 研究架構 2
第二章 文獻回顧 3
第一節 文字探勘 3
第二節 文本分類 5
第三章 研究方法與模型建立 9
第一節 實驗架構 9
第二節 資料蒐集與建立 10
第三節 前處理與特徵挑選 11
第四節 新聞文本分類器 24
第五節 分類表現評估標準 28
第四章 實證結果 29
第一節 資料敘述 29
第二節 分類表現評估 32
第五章 結論與建議 38
第一節 結論 38
第二節 未來研究建議 38
參考文獻 40
參考文獻 References
中文文獻
王力弘(2015),「社群媒體新詞偵測系統以PTT八卦版為例」,國立政治大學資訊科學系碩士論文
石敬弘(2017),「基於類神經之關聯詞向量表示於文本分類任務之研究」,國立臺灣師範大學資訊工程學系碩士論文
江易麇(2018),「應用雙向長短期記憶神經網路於新聞分類」,國立雲林科技大學資訊管理系碩士論文
張偉德(2018),「應用情感分析從媒體評論推測企業聲譽之研究」,國立中央大學企業管理學系碩士論文
陳建宏(2018),「新聞輿情、報酬與投資人交易行為」,國立中山大學財務管理學系碩士論文
陳翰(2018),「從社群媒體挖掘以感測日常交通滿意度之研究」,淡江大學運輸管理學系碩士論文
黃臆榤(2018),「結合語意關鍵詞與卷積神經網路之文本分類研究」,國立清華大學資訊工程學系碩士論文
蔡岳洋(2012),「基於虛擬標記資訊之半監督式特徵值擷取演算法」,國立交通大學資訊工程學系碩士論文
鄭開元(2018),「基於詞頻、位置及類別關係的特徵選擇方法」,銘傳大學資訊管理學系碩士論文
羅宇昇(2018),「利用詞映射與卷積神經網路提升病歷分類準確度之研究」,國防醫學院公共衛生學系碩士論文

英文文獻
Alessa, A., Faezipour, M., & Alhassan, Z. (2018). Text classification of flu-related tweets using fastText with sentiment and keyword features. 2018 International Conference on Healthcare Informatics, 366-367.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146.
Cavnar, W. B., & Trenkle, J. M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval.
Chowdhury, S. G., Routh, S., & Chakrabarti, S. (2014). News analytics and sentiment analysis to predict stock price trends. International Journal of Computer Science and Information Technologies, 5(3), 3595-3604.
Dörre, J., Gerstl, P., & Seiffert, R. (1999). Text mining: finding nuggets in mountains of textual data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 398-401.
Ebert, S., Vu, N. T., & Schütze, H. (2015). Cis-positive: A combination of convolutional neural networks and support vector machines for sentiment analysis in twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation, 527-532.
Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146-162.
Jing, L. P., Huang, H. K., & Shi, H. B. (2002). Improved feature selection approach TFIDF in text mining. In Proceedings. International Conference on Machine Learning and Cybernetics, 944-946
Johnson, R., & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016). fastText. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651.
Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
Koppel, M., & Shtrimberg, I. (2006). Good news or bad news? let the market decide. In Computing attitude and affect in text: Theory and applications, 297-301.
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text Classification Algorithms: A Survey. Information, 10(4), 150.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097-1105.
Kumar, B. S., & Ravi, V. (2016). A survey of the applications of text mining in financial domain. Knowledge-Based Systems, 114, 128-147.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Prusa, J. D., Khoshgoftaar, T. M., & Dittman, D. J. (2015). Impact of feature selection techniques for tweet sentiment classification. In The Twenty-Eighth International Flairs Conference.
Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval.
Sayfullina, L., Malmi, E., Liao, Y., & Jung, A. (2017, July). Domain adaptation for resume classification using convolutional neural networks. In International Conference on Analysis of Images, Social Networks and Texts, 82-93.
Severyn, A., & Moschitti, A. (2015). Unitn: Training deep convolutional neural network for twitter sentiment classification. In Proceedings of the 9th international workshop on semantic evaluation, 464-469.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1-9.
Tan, A. H. (1999, April). Text mining: The state of the art and the challenges. In Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, Vol. 8, 65-70.
Thaseen, I. S., & Kumar, C. A. (2017). Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University-Computer and Information Sciences, 29(4), 462-472.
Ting, S. L., Ip, W. H., & Tsang, A. H. (2011). Is Naive Bayes a good classifier for document classification. International Journal of Software Engineering and Its Applications, 5(3), 37-46.
Uysal, A. K., & Gunal, S. (2012). A novel probabilistic feature selection method for text classification. Knowledge-Based Systems, 36, 226-235.
Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing & Management, 50(1), 104-112.
Vu, T. T., Chang, S., Ha, Q. T., & Collier, N. (2012). An experiment in integrating sentiment features for tech stock prediction in twitter. In Proceedings of the workshop on information extraction and entity analytics on social media data, 23-38.
Wang, S., & Manning, C. D. (2012). Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th annual meeting of the association for computational linguistics: Short papers-volume 2, 90-94.
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Proceeding of the Fourteenth International Conference on Machine Learning, 412-420.
Yu, Y., & Yin, Y. (2019). Research on Chinese Text Sentiment Classification Process. Proceedings of the 3rd International Conference on Mechatronics Engineering and Information Technology.
Zhai, Y., Hsu, A., & Halgamuge, S. K. (2007). Combining news and technical indicators in daily stock price trends prediction. In International symposium on neural networks, 1087-1096.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2024-06-19
校外 Off-campus:開放下載的時間 available 2024-06-19

您的 IP(校外) 位址是 3.147.66.178
現在時間是 2024-04-29
論文校外開放下載的時間是 2024-06-19

Your IP address is 3.147.66.178
The current date is 2024-04-29
This thesis will be available to you on 2024-06-19.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 2024-06-19

QR Code