Responsive image
博碩士論文 etd-0801115-163511 詳細資訊
Title page for etd-0801115-163511
論文名稱
Title
以字典為基礎之雲端情感分析方法
A Lexicon-Based Sentiment Analysis Method on Cloud Platform
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
103
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2015-07-15
繳交日期
Date of Submission
2015-09-01
關鍵字
Keywords
Hadoop、雲端運算、情緒分析、自然語言處理、文字探勘、MapReduce
Sentiment Analysis, Hadoop, Map Reduce, NLP, Cloud computing, Text mining
統計
Statistics
本論文已被瀏覽 6004 次,被下載 890
The thesis/dissertation has been browsed 6004 times, has been downloaded 890 times.
中文摘要
網際網路發達的現今社會,透過Web 2.0的盛行,人們可以透過使用者自建內容,發達個人評論、意見、或者表達情緒。透過文字探勘的技術,這些資訊將為組織或企業帶來價值。而另一方面,組織或企業將面對與日俱增的資料量,如何快速的處理這些資料成為各個企業或組織共同需要面對的問題,本研究提出一個雲端情緒分析平台的設計,此平台使用開源軟體Hadoop的軟體架構MapReduce做為演算法核心,我們以實際資料驗證此平台之效能,證明此雲端情緒分析方法可以接近線性方式提高情緒分析的效率。
Abstract
With the widespread of mobile computer devices and the fast development of the internet technologies, people are more convenient to express their opinions, forming user generate content (UGC). Using text mining and sentiment analysis techniques on the analysis of UGC allows enterprises and organizations to quickly identify people’s opinions. However, the huge amount of UGC data also create challenges on how to efficiently and effectively analyze the data in a timely manner.
In this thesis, we propose an approach that utilizes Cloud techniques, namely Map Reduce on Hadoop platform, to tackle this problem. The experimental results using real data set shows the near linear scalability of the proposed approach.
目次 Table of Contents
第一章、緒論 1
第一節、研究背景 1
第二節、研究動機 3
第三節、研究問題 4
第二章、文獻探討 6
第一節、面向層級情緒分析 6
一、情緒分析 6
二、自然語言處理工具 10
第二節、Hadoop 13
一、Hadoop Distributed File System (HDFS) 14
二、MapReduce 15
第三節、情緒分析結合Hadoop 17
第三章、系統架構與研究流程 20
第一節、研究流程 20
一、建立資料集 21
二、資料前處理 23
三、情緒字典建立與擴充 26
四、議題字典 31
五、情緒分析 34
第四章、使用MapReduce進行情緒分析 39
第一節、系統架構 39
第二節、情緒字典建立與擴充 41
一、 初始情緒字典建立 41
二、 計算候選詞之詞頻 43
三、 計算情緒字眼之詞頻 44
四、 計算候選值與其共同出現之情緒字眼之詞頻 45
五、 計算共同出現之機率 47
第三節、情緒分析 48
一、 情緒分析 48
第五章、實驗與評估 52
第一節、資料集說明 52
第二節、情緒分析結果 52
一、主權 53
二、社會 56
三、國安 57
四、 產業 60
五、經濟 62
六、正確性評估 64
第三節、 效能分析 71
一、情緒擴充模組 73
二、情緒分析模組 82
第六章、結論與未來展望 88
第一節、研究限制 88
第二節、未來展望 89
參考文獻 90
參考文獻 References
壹、中文部分

李政儒, 游基鑫, &陳信希. (2012). 廣義知網詞彙意見極性的預測.中文計算語言學期刊,17(2),21-36.
黃孝文.(2009).雲端運算服務環境下運用文字探勘於語意註解網頁文件分析之 研究. 政治大學資訊管理研究所學位論文,1-113
劉鉑志. (2014).從使用者自建內容挖掘網民對議題的立場: 以兩岸服貿為例.中山大學資訊管理學系研究所學位論文, 1-55.
賴亦傑. (2011). 應用多詞及多詞性語言模型的中文斷詞及詞性標記方法.中興大學資訊網路多媒體研究所學位論文, 1-113.

貳、西文部分
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., ... & Zaharia, M. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50-58.

Bai, A., Hammer, H., Yazidi, A., & Engelstad, P. (2014, December). Constructing sentiment lexicons in Norwegian from a large text corpus. In Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on (pp. 231-237). IEEE.

Dean, J. and S. Ghemawat (2004). "MapReduce: simplified data processing on large clusters." Communications of the ACM 51(1): 107-113.

Farina, J., Mazuran, M., & Quintarelli, E. (2014). Extraction, Sentiment Analysis and Visualization of Massive Public Messages. In New Trends in Databases and Information Systems (pp. 159-168). Springer International Publishing.

Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS operating systems review (Vol. 37, No. 5, pp. 29-43). ACM.

Khairnar, J. and M. Kinikar (2015). "Sentiment Analysis Based Mining and Summarizing Using SVM-MapReduce." IJCSNS 15(4): 90.

Kucuktunc, O., Cambazoglu, B. B., Weber, I., & Ferhatosmanoglu, H. (2012, February). A large-scale sentiment analysis for Yahoo! answers. In Proceedings of the fifth ACM international conference on Web search and data mining (pp. 633-642). ACM.

Khairnar, J. and M. Kinikar (2015). "Sentiment Analysis Based Mining and Summarizing Using SVM-MapReduce." IJCSNS 15(4): 90.

Khuc, V. N., Shivade, C., Ramnath, R., & Ramanathan, J. (2012, March). Towards building large-scale distributed systems for twitter sentiment analysis. In Proceedings of the 27th annual ACM symposium on applied computing (pp. 459-464). ACM.

Lu, Y., Castellanos, M., Dayal, U., & Zhai, C. (2011, March). Automatic construction of a context-aware sentiment lexicon: an optimization approach. In Proceedings of the 20th international conference on World wide web (pp. 347-356). ACM.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.

Liu, B., Blasch, E., Chen, Y., Shen, D., & Chen, G. (2013, October). Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier. In Big Data, 2013 IEEE International Conference on (pp. 99-104). IEEE.

Li, S., Li, J., Huang, G., Tan, R., & Pan, R. (2015). Tag-Weighted Topic Model For Large-scale Semi-Structured Documents. arXiv preprint arXiv:1507.08396.

Liu, B., Blasch, E., Chen, Y., Shen, D., & Chen, G. (2013, October). Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier. In Big Data, 2013 IEEE International Conference on (pp. 99-104). IEEE.

Pakize, S. R., & Gandomi, A. (2014). Comparative Study of Classification Algorithms Based on MapReduce Model. International Journal of Innovative Research in Advanced Engineering, ISSN, 2349-2163.

Qiu, G., Liu, B., Bu, J., & Chen, C. (2009, July). Expanding Domain Sentiment Lexicon through Double Propagation. In IJCAI (Vol. 9, pp. 1199-1204).

Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307.

Turney, P. D. (2001). Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), pp. 491-502.

Zhang, Ley, Ghosh, Riddhiman, Dekhil, Mohamed, Hsu, Meichun, & Liu, Bing. (2011). Combining lexiconbased and learning-based methods for twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011, 89.

Zhang, C., & Sun, J. (2012, April). Large scale microblog mining using distributed MB-LDA. In Proceedings of the 21st international conference companion on World Wide Web (pp. 1035-1042). ACM.
.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code