Responsive image
博碩士論文 etd-0803118-124937 詳細資訊
Title page for etd-0803118-124937
論文名稱
Title
一個基於文本整合實體主題情緒辨識的框架之研究
An Integrated Framework for Identifying Entities Topics and Sentiment from Text
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
46
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2018-07-23
繳交日期
Date of Submission
2018-09-03
關鍵字
Keywords
實體萃取、主題辨識、情感分析、文字探勘、中文自然語言處理、主題模型、構面與情感整合模型
Entity Extraction, Text Mining, Sentiment Analysis, Topic Identification, Chinese Natural Language Processing, Topic modeling, Aspect and Sentiment Unification Model
統計
Statistics
本論文已被瀏覽 6007 次,被下載 73
The thesis/dissertation has been browsed 6007 times, has been downloaded 73 times.
中文摘要
伴隨互聯網上新聞文章量的增長,主題和情感的分析已經被廣泛用於文本挖掘。然而,我們很難在實體層面同時識別主題和情感,尤其是中文的文章。為了解決這些問題,我們研究提供一個可以從文本裡識別實體、主題和情感一個框架。我們使用演算法將文檔切割成以實體為主體的句子,並實現構面與情感整合模型,用以識別主題和情緒。最後,應用詞嵌入模型、高斯相似核心和基於層次的聚類算法來生成結果。為了評估方法,我們從蘋果新聞網收集數據,並選擇2013年到2017 年的政治版面。與其他系統相比,在不同層級下,實驗結果顯示我們的框架在實體、主題和情感識別中是有效的。
Abstract
Due to the growth of news articles on the internet, topic and sentiment analysis have been widely used for text mining. However, it's difficult to identify topics and sentiments simultaneously in entity-level, especially in Chinese articles. To solve this problem, our approach provides an integrated framework for identifying entities, topics, and sentiments from texts. We use our algorithm to split documents into sentences with entities and implement ASUM to identify topics and sentiments. In the end, we apply word2vec model, Gaussian similarity kernel, and complete-linkage agglomerative algorithm to generate results. To evaluate our method, we collect data from the news website of “Apple Daily”, and select politics section from 2013 to 2017. Comparing with other system in different level, the experimental results show that our framework in entity-level is effective in topics and sentiments identification.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
Table of Contents iv
List of Figures vi
List of Tables vii
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 2
Chapter 2 Literature Review 4
2.1 Chinese Sentence Segmentation 4
2.2 Aspect Extraction 4
2.2.1 Rules-based 4
2.2.2 Topic modeling 5
2.2.3 Deep Convolutional Neural Network 6
2.3 JST and ASUM 6
Chapter 3 Approach 8
3.1 Research Skeleton 8
3.2 Data Collection and Data Preprocessing 9
3.3 Entity Identification and Sentence Extraction 10
3.3.1 Named Entity Recognition 10
3.3.2 Word Segmentation and Part-of-Speech Tagger 11
3.3.3 Stanford Dependency Parser 11
3.3.4 Rules for Reconstruct Sentences 12
3.4 Aspect and Sentiment Unification Model (ASUM) 15
3.5 Topic-Sentiment Mapping 19
3.5.1 Similarity of Topics 20
3.5.2 Convert Similarity to Distance 20
3.5.3 Generate Integrated Topics 21
3.6 Apply Integrated Topics 23
3.6.1 Document-Level Topic-Sentiment Identification 23
3.6.2 Sentence-Level Topic-Sentiment Identification 23
Chapter 4 Evaluation 24
4.1 Data Resource and Data Preprocessing 24
4.2 Evaluate Entity Segmentation 25
4.3 Comparisons with Other Methods 27
4.3.1 Attribute Selection from ASUM 27
4.3.2 Evaluate Document-Level Topic and Sentiment 28
4.3.3 Evaluate Entity-Level Topic and Sentiment 32
Chapter 5 Conclusion 35
Reference 36
參考文獻 References
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
Cambria, E., & Hussain, A. (2015) Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis, Springer, Cham, Switzerland.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235.
He, Y., Lin, C., Qatar, W., & Wong, K. F.(2013). Dynamic joint sentiment-topic model. ACM Transactions on Intelligent Systems and Technology. Volume 5 Issue 1
Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM.
Jin, M., Kim, M. Y., Kim, D., & Lee, J. H. (2004). Segmentation of Chinese long sentences using commas. In Proceedings of the Third SIGHAN Workshop on Chinese Language Processing.
Jo, Y., & Oh, A. (2011). Aspect and Sentiment Unification Model for Online Review Analysis. Proceedings of the fourth ACM international conference on Web search and data mining, 815-824.
Kim, S., Zhang, J., Chen, Z., Oh, A., & Liu, S. (2013). A Hierarchical Aspect-Sentiment Model for Online Reviews. Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 527-533.
Klein, D., & Manning, C. D. (2003, July). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1 (pp. 423-430). Association for Computational Linguistics.
Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211-225.
Lin, C. & He, Y. (2009). Joint Sentiment/Topic Model for Sentiment Analysis. Proceedings of the 18th ACM conference on Information and knowledge management, 375-384.
Liu, Q., Liu, B., Zhang, Y., Kim, D. S., & Gao, Z. (2016). Improving Opinion Aspect Extraction Using Semantic Similarity and Aspect Associations. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2986-2992.
Mukherjee, A., & Liu, B. (2012). Aspect Extraction through Semi-Supervised Modeling. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 339-348.
Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015.
Popescu, A. M., Nguyen, B., & Etzioni, O. (2005, October). OPINE: Extracting product features and opinions from reviews. In Proceedings of HLT/EMNLP on interactive demonstrations(pp. 32-33). Association for Computational Linguistics.
Poria, S., Cambria E., Ku, L. W., Gui, C., & Gelbukh, A. (2014). A Rule-Based Approach to Aspect Extraction from Product Reviews. Proceedings of the Second Workshop on Natural Language Processing for Social Media, 29-37.
Poria, S., Cambriab, E., & Gelbukh, A. (2016). Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems, 42-49.
Qiu, G., Liu, B., Bu, J., & Chen, C. (2011). Opinion Word Expansion and Target Extraction through Double Propagation. Computational Linguistics, Volume 37 Issue 1, March 2011, 9-27.
Rana, T. A., & Cheah, Y. N. (2016). Aspect extraction in sentiment analysis: comparative analysis and survey. Artificial Intelligence Review archive Volume 46 Issue 4, December 2016, 459-483
Scaffidi, C., Bierhoff, K., Chang, E., Felker, M., Ng, H., & Jin, C. (2007, June). Red Opal: product-feature scoring from reviews. In Proceedings of the 8th ACM conference on Electronic commerce(pp. 182-191). ACM.
Späth, H. (1980). Cluster analysis algorithms for data reduction and classification of objects.
Xu, S. Q., Kong, F., Li, P. F., & Zhu, Q. M. (2012). A Chinese Sentence Segmentation Approach Based on Comma. Chinese Lexical Semantics, 809-817
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code