論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
一個基於文本整合實體主題情緒辨識的框架之研究 An Integrated Framework for Identifying Entities Topics and Sentiment from Text |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
46 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2018-07-23 |
繳交日期 Date of Submission |
2018-09-03 |
關鍵字 Keywords |
實體萃取、主題辨識、情感分析、文字探勘、中文自然語言處理、主題模型、構面與情感整合模型 Entity Extraction, Text Mining, Sentiment Analysis, Topic Identification, Chinese Natural Language Processing, Topic modeling, Aspect and Sentiment Unification Model |
||
統計 Statistics |
本論文已被瀏覽 6082 次,被下載 73 次 The thesis/dissertation has been browsed 6082 times, has been downloaded 73 times. |
中文摘要 |
伴隨互聯網上新聞文章量的增長,主題和情感的分析已經被廣泛用於文本挖掘。然而,我們很難在實體層面同時識別主題和情感,尤其是中文的文章。為了解決這些問題,我們研究提供一個可以從文本裡識別實體、主題和情感一個框架。我們使用演算法將文檔切割成以實體為主體的句子,並實現構面與情感整合模型,用以識別主題和情緒。最後,應用詞嵌入模型、高斯相似核心和基於層次的聚類算法來生成結果。為了評估方法,我們從蘋果新聞網收集數據,並選擇2013年到2017 年的政治版面。與其他系統相比,在不同層級下,實驗結果顯示我們的框架在實體、主題和情感識別中是有效的。 |
Abstract |
Due to the growth of news articles on the internet, topic and sentiment analysis have been widely used for text mining. However, it's difficult to identify topics and sentiments simultaneously in entity-level, especially in Chinese articles. To solve this problem, our approach provides an integrated framework for identifying entities, topics, and sentiments from texts. We use our algorithm to split documents into sentences with entities and implement ASUM to identify topics and sentiments. In the end, we apply word2vec model, Gaussian similarity kernel, and complete-linkage agglomerative algorithm to generate results. To evaluate our method, we collect data from the news website of “Apple Daily”, and select politics section from 2013 to 2017. Comparing with other system in different level, the experimental results show that our framework in entity-level is effective in topics and sentiments identification. |
目次 Table of Contents |
論文審定書 i 摘要 ii Abstract iii Table of Contents iv List of Figures vi List of Tables vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 Chapter 2 Literature Review 4 2.1 Chinese Sentence Segmentation 4 2.2 Aspect Extraction 4 2.2.1 Rules-based 4 2.2.2 Topic modeling 5 2.2.3 Deep Convolutional Neural Network 6 2.3 JST and ASUM 6 Chapter 3 Approach 8 3.1 Research Skeleton 8 3.2 Data Collection and Data Preprocessing 9 3.3 Entity Identification and Sentence Extraction 10 3.3.1 Named Entity Recognition 10 3.3.2 Word Segmentation and Part-of-Speech Tagger 11 3.3.3 Stanford Dependency Parser 11 3.3.4 Rules for Reconstruct Sentences 12 3.4 Aspect and Sentiment Unification Model (ASUM) 15 3.5 Topic-Sentiment Mapping 19 3.5.1 Similarity of Topics 20 3.5.2 Convert Similarity to Distance 20 3.5.3 Generate Integrated Topics 21 3.6 Apply Integrated Topics 23 3.6.1 Document-Level Topic-Sentiment Identification 23 3.6.2 Sentence-Level Topic-Sentiment Identification 23 Chapter 4 Evaluation 24 4.1 Data Resource and Data Preprocessing 24 4.2 Evaluate Entity Segmentation 25 4.3 Comparisons with Other Methods 27 4.3.1 Attribute Selection from ASUM 27 4.3.2 Evaluate Document-Level Topic and Sentiment 28 4.3.3 Evaluate Entity-Level Topic and Sentiment 32 Chapter 5 Conclusion 35 Reference 36 |
參考文獻 References |
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. Cambria, E., & Hussain, A. (2015) Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis, Springer, Cham, Switzerland. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235. He, Y., Lin, C., Qatar, W., & Wong, K. F.(2013). Dynamic joint sentiment-topic model. ACM Transactions on Intelligent Systems and Technology. Volume 5 Issue 1 Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM. Jin, M., Kim, M. Y., Kim, D., & Lee, J. H. (2004). Segmentation of Chinese long sentences using commas. In Proceedings of the Third SIGHAN Workshop on Chinese Language Processing. Jo, Y., & Oh, A. (2011). Aspect and Sentiment Unification Model for Online Review Analysis. Proceedings of the fourth ACM international conference on Web search and data mining, 815-824. Kim, S., Zhang, J., Chen, Z., Oh, A., & Liu, S. (2013). A Hierarchical Aspect-Sentiment Model for Online Reviews. Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 527-533. Klein, D., & Manning, C. D. (2003, July). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1 (pp. 423-430). Association for Computational Linguistics. Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211-225. Lin, C. & He, Y. (2009). Joint Sentiment/Topic Model for Sentiment Analysis. Proceedings of the 18th ACM conference on Information and knowledge management, 375-384. Liu, Q., Liu, B., Zhang, Y., Kim, D. S., & Gao, Z. (2016). Improving Opinion Aspect Extraction Using Semantic Similarity and Aspect Associations. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2986-2992. Mukherjee, A., & Liu, B. (2012). Aspect Extraction through Semi-Supervised Modeling. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 339-348. Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Popescu, A. M., Nguyen, B., & Etzioni, O. (2005, October). OPINE: Extracting product features and opinions from reviews. In Proceedings of HLT/EMNLP on interactive demonstrations(pp. 32-33). Association for Computational Linguistics. Poria, S., Cambria E., Ku, L. W., Gui, C., & Gelbukh, A. (2014). A Rule-Based Approach to Aspect Extraction from Product Reviews. Proceedings of the Second Workshop on Natural Language Processing for Social Media, 29-37. Poria, S., Cambriab, E., & Gelbukh, A. (2016). Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems, 42-49. Qiu, G., Liu, B., Bu, J., & Chen, C. (2011). Opinion Word Expansion and Target Extraction through Double Propagation. Computational Linguistics, Volume 37 Issue 1, March 2011, 9-27. Rana, T. A., & Cheah, Y. N. (2016). Aspect extraction in sentiment analysis: comparative analysis and survey. Artificial Intelligence Review archive Volume 46 Issue 4, December 2016, 459-483 Scaffidi, C., Bierhoff, K., Chang, E., Felker, M., Ng, H., & Jin, C. (2007, June). Red Opal: product-feature scoring from reviews. In Proceedings of the 8th ACM conference on Electronic commerce(pp. 182-191). ACM. Späth, H. (1980). Cluster analysis algorithms for data reduction and classification of objects. Xu, S. Q., Kong, F., Li, P. F., & Zhu, Q. M. (2012). A Chinese Sentence Segmentation Approach Based on Comma. Chinese Lexical Semantics, 809-817 |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |