國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,新聞文本情緒分類之實證研究-以鉅亨網新聞為例,Empirical Study of News Sentiment Classification: Evidence from Anue Financial News

論文名稱 Title	新聞文本情緒分類之實證研究-以鉅亨網新聞為例 Empirical Study of News Sentiment Classification: Evidence from Anue Financial News
系所名稱 Department	財務管理學系 Department of Finance
畢業學年期 Year, semester	107 學年度第 2 學期 The spring semester of Academic Year 107	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	52
研究生 Author	田高銘 Kao-Ming Tien
指導教授 Advisor	黃振聰 Jen-Jsung Huang
召集委員 Convenor	王昭文 Chou-Wen Wang
口試委員 Advisory Committee	吳錦文, 邱魏頌正, 黃泓智 Chin-Wen Wu; Song-Zan Chiou Wei; Hong-Chih Huang
口試日期 Date of Exam	2019-06-18	繳交日期 Date of Submission	2019-06-19
關鍵字 Keywords	文字探勘、機器學習、新聞情緒、文本分類、自然語言處理 Natural Language Processing, Text Mining, Text Classification, New Sentiment, Machine Learning
統計 Statistics	本論文已被瀏覽 5801 次，被下載 1 次 The thesis/dissertation has been browsed 5801 times, has been downloaded 1 times.

中文摘要
現今網路新聞媒體崛起，網路新聞成為投資人執行投資決策的判斷依據之一，然而財經新聞網站每天所產生的大量新聞資訊，投資人已經無法利用傳統人力閱讀篩選的方式，逐一判斷每篇新聞所反映目前市場的情緒。　　因此本研究為協助投資人快速了解目前市場情緒，以財經新聞媒體Anue鉅亨網的台灣股市新聞進行中文新聞文本情緒分類，使用文字探勘與文本分類的技術，探討中文文本的資料前處理與文本分類模型的成效，觀察不同的特徵提取與挑選方法搭配不同分類模型的效果為何。其中字詞特徵提取與挑選使用的方法為N-gram、TF-IDF、卡方與互資訊，而分類模型使用樸素貝葉斯分類器、fastText與多層感知器。　　最後研究結果指出：(1).N-gram字詞特徵提取在三種分類模型中皆能提升準確率，特別是應用在樸素貝葉斯分類器，可以有效地改善模型假設字詞特徵獨立的缺點。(2).TF-IDF字詞特徵挑選僅對樸素貝葉斯分類器有效，可以在減少字詞數量的情況下，同時降低訓練時間與提升準確度。(3).卡方檢定與互資訊特徵挑選皆可以提升fastText與多層感知器的準確率，且卡方檢定特徵挑選與fastText分類模型取得本研究最優異的分類效果。
Abstract
Nowadays, online news has become one of the judgments for investors to make investment decisions. However, a large amount of information generated by financial news websites everyday makes investors unable to use traditional human reading and screening methods to judge and verify the current market sentiment reflected by each news report. 　　In order to help investors understand the current market sentiment quickly, we use the techniques of text mining and text classification to classify new sentiments. This study collects Taiwanese stock market news of Anue Financial News and use different methods of text pre-processing and classifier to achieve the best classification performance. 　　The empirical results show：(1) N-gram feature extraction can improve the accuracy of all classifiers, especially the naive Bayes classifier which can effectively overcome shortcomings of the independence assumptions. (2) TF-IDF feature selection only effective for naive Bayes classifier. Under the circumstances of the number of words decreasing, it can improve the accuracy and reduce the training time. (3) The Chi-square test and mutual information feature selection can improve the accuracy of both fastText and Multi-layer Perceptron. Furthermore, the combination of Chi-square test feature and fastText achieved the best performance in this study.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 iv 圖次 vi 表次 vii 第一章緒論 1 第一節研究動機與目的 1 第二節研究架構 2 第二章文獻回顧 3 第一節文字探勘 3 第二節文本分類 5 第三章研究方法與模型建立 9 第一節實驗架構 9 第二節資料蒐集與建立 10 第三節前處理與特徵挑選 11 第四節新聞文本分類器 24 第五節分類表現評估標準 28 第四章實證結果 29 第一節資料敘述 29 第二節分類表現評估 32 第五章結論與建議 38 第一節結論 38 第二節未來研究建議 38 參考文獻 40

參考文獻 References
中文文獻王力弘(2015)，「社群媒體新詞偵測系統以PTT八卦版為例」，國立政治大學資訊科學系碩士論文石敬弘(2017)，「基於類神經之關聯詞向量表示於文本分類任務之研究」，國立臺灣師範大學資訊工程學系碩士論文江易麇(2018)，「應用雙向長短期記憶神經網路於新聞分類」，國立雲林科技大學資訊管理系碩士論文張偉德(2018)，「應用情感分析從媒體評論推測企業聲譽之研究」，國立中央大學企業管理學系碩士論文陳建宏(2018)，「新聞輿情、報酬與投資人交易行為」，國立中山大學財務管理學系碩士論文陳翰(2018)，「從社群媒體挖掘以感測日常交通滿意度之研究」，淡江大學運輸管理學系碩士論文黃臆榤(2018)，「結合語意關鍵詞與卷積神經網路之文本分類研究」，國立清華大學資訊工程學系碩士論文蔡岳洋(2012)，「基於虛擬標記資訊之半監督式特徵值擷取演算法」，國立交通大學資訊工程學系碩士論文鄭開元(2018)，「基於詞頻、位置及類別關係的特徵選擇方法」，銘傳大學資訊管理學系碩士論文羅宇昇(2018)，「利用詞映射與卷積神經網路提升病歷分類準確度之研究」，國防醫學院公共衛生學系碩士論文英文文獻 Alessa, A., Faezipour, M., & Alhassan, Z. (2018). Text classification of flu-related tweets using fastText with sentiment and keyword features. 2018 International Conference on Healthcare Informatics, 366-367. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146. Cavnar, W. B., & Trenkle, J. M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Chowdhury, S. G., Routh, S., & Chakrabarti, S. (2014). News analytics and sentiment analysis to predict stock price trends. International Journal of Computer Science and Information Technologies, 5(3), 3595-3604. Dörre, J., Gerstl, P., & Seiffert, R. (1999). Text mining: finding nuggets in mountains of textual data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 398-401. Ebert, S., Vu, N. T., & Schütze, H. (2015). Cis-positive: A combination of convolutional neural networks and support vector machines for sentiment analysis in twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation, 527-532. Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146-162. Jing, L. P., Huang, H. K., & Shi, H. B. (2002). Improved feature selection approach TFIDF in text mining. In Proceedings. International Conference on Machine Learning and Cybernetics, 944-946 Johnson, R., & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058. Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016). fastText. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651. Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188. Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882. Koppel, M., & Shtrimberg, I. (2006). Good news or bad news? let the market decide. In Computing attitude and affect in text: Theory and applications, 297-301. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text Classification Algorithms: A Survey. Information, 10(4), 150. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097-1105. Kumar, B. S., & Ravi, V. (2016). A survey of the applications of text mining in financial domain. Knowledge-Based Systems, 114, 128-147. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Prusa, J. D., Khoshgoftaar, T. M., & Dittman, D. J. (2015). Impact of feature selection techniques for tweet sentiment classification. In The Twenty-Eighth International Flairs Conference. Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. Sayfullina, L., Malmi, E., Liao, Y., & Jung, A. (2017, July). Domain adaptation for resume classification using convolutional neural networks. In International Conference on Analysis of Images, Social Networks and Texts, 82-93. Severyn, A., & Moschitti, A. (2015). Unitn: Training deep convolutional neural network for twitter sentiment classification. In Proceedings of the 9th international workshop on semantic evaluation, 464-469. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1-9. Tan, A. H. (1999, April). Text mining: The state of the art and the challenges. In Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, Vol. 8, 65-70. Thaseen, I. S., & Kumar, C. A. (2017). Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University-Computer and Information Sciences, 29(4), 462-472. Ting, S. L., Ip, W. H., & Tsang, A. H. (2011). Is Naive Bayes a good classifier for document classification. International Journal of Software Engineering and Its Applications, 5(3), 37-46. Uysal, A. K., & Gunal, S. (2012). A novel probabilistic feature selection method for text classification. Knowledge-Based Systems, 36, 226-235. Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing & Management, 50(1), 104-112. Vu, T. T., Chang, S., Ha, Q. T., & Collier, N. (2012). An experiment in integrating sentiment features for tech stock prediction in twitter. In Proceedings of the workshop on information extraction and entity analytics on social media data, 23-38. Wang, S., & Manning, C. D. (2012). Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th annual meeting of the association for computational linguistics: Short papers-volume 2, 90-94. Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Proceeding of the Fourteenth International Conference on Machine Learning, 412-420. Yu, Y., & Yin, Y. (2019). Research on Chinese Text Sentiment Classification Process. Proceedings of the 3rd International Conference on Mechatronics Engineering and Information Technology. Zhai, Y., Hsu, A., & Halgamuge, S. K. (2007). Combining news and technical indicators in daily stock price trends prediction. In International symposium on neural networks, 1087-1096.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0519119-145602.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS