國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,建構跨語言情緒詞典之框架研究,A Framework to Cross-lingual Sentiment Lexicon Construction

論文名稱 Title	建構跨語言情緒詞典之框架研究 A Framework to Cross-lingual Sentiment Lexicon Construction
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	107 學年度第 2 學期 The spring semester of Academic Year 107	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	42
研究生 Author	吳明倫 MING-LUN WU
指導教授 Advisor	黃三益 San-Yih Hwang
召集委員 Convenor	康藝晃 Yihuang Kang
口試委員 Advisory Committee	簡士鎰, 洪澤權 Shih-Yi Chien; Patrick C.K. Hung
口試日期 Date of Exam	2019-07-22	繳交日期 Date of Submission	2019-08-27
關鍵字 Keywords	跨語言空間投影、情緒分析、跨語言情緒辭典學習、文字分析、語意空間特殊化 Cross-Lingual Mapping, Sentiment Analysis, Cross-Lingual Sentiment Lexicon Learning, Text Mining, Semantic Specialization
統計 Statistics	本論文已被瀏覽 6225 次，被下載 174 次 The thesis/dissertation has been browsed 6225 times, has been downloaded 174 times.

中文摘要
使用情緒詞典進行情緒分析是相當常見的方式，然而情緒辭典的品質在不同語言間有極大的落差，常見的語言如英文擁有豐富、由專家所標定的辭典，而其他大部分語言則常須依靠人工或機器翻譯的方式取得辭典。建置跨語言情緒辭典的目的在於借用主流語言的語意資源，來協助建置其他非主流語言的情緒辭典。本論文提出一個建置跨語言情緒辭典的框架，我們在語意空間中加入外部語意資源的資訊，再將不同語言的語意空間投射到相同空間來產生非主流語言的情緒辭典，透過此框架能夠借用主流語言的語意資源來訓練出具有特定領域性質的非主流語言情緒辭典，產生時不需要預先標定情緒標籤的文本，也不需要平行文本來進行跨語言投影。我們在實驗中證明透過本框架產生的情緒辭典，比起使用機器翻譯專家標定的辭典，在分類準確率上有更佳的表現，而且在文本中也具有較高的涵蓋率。
Abstract
Lexicon-based sentiment analysis is a popular and practical approach for sentiment analysis. However, the quality of sentiment lexicon varies greatly across languages. Some languages such as English are rich in lexicons that are crafted by experts, while most other languages must rely on expensive manual translation or ineffective machine translation to obtain the lexicon. The purpose of cross-lingual lexicon learning is to leverage the language with rich resource to extend the lexicons for the languages with less resource. This thesis proposes a framework to perform cross-lingual lexicon learning. We incorporate semantic relation and contextual information in respective vector spaces of both dominant and target languages, and then project the space of both languages into the shared space. Finally, we can query on the shared space to obtain sentiment lexicon in target language. Our framework does not require corpus with sentiment labels, nor does it need parallel corpus for cross-lingual transformation. We show in experiments that the sentiment lexicon generated through our framework has better performance in classification accuracy than the lexicon generating using machine translation, and it also has a higher coverage in the corpus.

目次 Table of Contents
TABLE OF CONTENTS 論文審定書 i 致謝 ii 摘要 iii ABSTRACT iv TABLE OF CONTENTS vi LIST OF FIGURES vii LIST OF TABLE vii CHAPTER 1 – INTRODUCTION 1 CHAPTER 2 – RELATED WORK 5 SENTIMENT ANALYSIS 5 SEMANTIC SPECIALIZATION 8 CROSS-LINGUAL TRANSFORMATION 10 CHAPTER 3 – OUR FRAMEWORK 12 3.1 SEMANTIC SPECIALIZATION 13 3.1.1 Learning from scratch 13 3.1.2 Fine-tuning Pre-trained Vectors 17 3.2 CROSS-LINGUAL MAPPING 20 Linear Transformation 21 3.3 NEIGHBORHOOD QUERY 21 CHAPTER 4 – EXPERIMENTS 22 4.1 EVALUATION SETTINGS 22 4.1.1 Binary sentiment classification 22 4.1.2 Data 23 4.1.3 Parameter Settings 24 4.1.4 Evaluation Metrics 25 4.2 RESULTS 25 4.2.1 Accuracy 25 4.2.2 Coverage 28 CHAPTER 5 – CONCLUSION AND FUTURE WORK 29 REFERENCES 30 LIST OF FIGURES FIGURE 1 : OUR FRAMEWORK 12 FIGURE 2: TESTING ACCURACY USING DIFFERENT METHOD COMBINATIONS 26 FIGURE 3: LEXICON COVERAGE USING DIFFERENT METHOD COMBINATIONS. 28 List of Table TABLE 1: ACCURACY OF BINARY SENTIMENT CLASSIFICATION USING LEXICONS GENERATED BY VARIOUS METHODS. 26

參考文獻 References
References [1] Y. R. Tausczik and J. W. Pennebaker, “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods,” J. Lang. Soc. Psychol., vol. 29, no. 1, pp. 24–54, Mar. 2010. [2] P. J. Stone, D. C. Dunphy, and M. S. Smith, The general inquirer: A computer approach to content analysis. Oxford, England: M.I.T. Press, 1966. [3] “CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON - Mohammad - 2013 - Computational Intelligence - Wiley Online Library.” [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8640.2012.00460.x. [Accessed: 05-Jul-2019]. [4] X. Wan, “Co-training for Cross-lingual Sentiment Classification,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1, Stroudsburg, PA, USA, 2009, pp. 235–243. [5] R. Mihalcea, C. Banea, and J. Wiebe, “Learning Multilingual Subjective Language via Cross-Lingual Projections,” in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 2007, pp. 976–983. [6] A. Hassan, A. Abu-Jbara, R. Jha, and D. Radev, “Identifying the Semantic Orientation of Foreign Words,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2, Stroudsburg, PA, USA, 2011, pp. 592–597. [7] “Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets,” p. 10. [8] D. Gao, F. Wei, W. Li, X. Liu, and M. Zhou, “Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation,” Comput. Linguist., vol. 41, no. 1, pp. 21–40, Feb. 2015. [9] “(PDF) Sentiment analysis: Capturing favorability using natural language processing.” [Online]. Available: https://www.researchgate.net/publication/220916772_Sentiment_analysis_Capturing_favorability_using_natural_language_processing. [Accessed: 16-Jul-2019]. [10] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques,” in In IEEE Intl. Conf. on Data Mining (ICDM, 2003, pp. 427–434. [11] N. A. Abdulla, N. A. Ahmed, M. A. Shehab, and M. Al-Ayyoub, “Arabic sentiment analysis: Lexicon-based and corpus-based,” in 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), 2013, pp. 1–6. [12] M. S. Mubarok, Adiwijaya, and M. D. Aldhi, “Aspect-based sentiment analysis to review products using Naïve Bayes,” presented at the INTERNATIONAL CONFERENCE ON MATHEMATICS: PURE, APPLIED AND COMPUTATION: Empowering Engineering using Mathematics, Surabaya, Indonesia, 2017, p. 020060. [13] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques,” in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), 2002, pp. 79–86. [14] M. S. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine learning techniques,” in 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 2013, pp. 1–5. [15] T. Mullen and N. Collier, “Sentiment Analysis using Support Vector Machines with Diverse Information Sources,” in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004, pp. 412–418. [16] A. Ghosh et al., “SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter,” in Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, 2015, pp. 470–478. [17] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1746–1751. [18] X. Dong and G. de Melo, “Cross-Lingual Propagation for Deep Sentiment Analysis,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [19] R. Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 2013, pp. 1631–1642. [20] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-Based Methods for Sentiment Analysis,” Comput. Linguist., vol. 37, no. 2, pp. 267–307, Jun. 2011. [21] P. D. Turney and M. L. Littman, “Measuring praise and criticism: Inference of semantic orientation from association,” ACM Trans. Inf. Syst., vol. 21, no. 4, pp. 315–346, Oct. 2003. [22] C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text,” p. 10. [23] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” p. 8. [24] W. L. Hamilton, K. Clark, J. Leskovec, and D. Jurafsky, “Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 2016, pp. 595–605. [25] E. Demirtas and M. Pechenizkiy, “Cross-lingual polarity detection with machine translation,” in Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining - WISDOM ’13, Chicago, Illinois, 2013, pp. 1–8. [26] A. Hassan, A. Abu-Jbara, R. Jha, and D. Radev, “Identifying the semantic orientation of foreign words,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, 2011, pp. 592–597. [27] J. Turian, L.-A. Ratinov, and Y. Bengio, “Word Representations: A Simple and General Method for Semi-Supervised Learning,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 384–394. [28] Z. S. Harris, “Distributional Structure,” WORD, vol. 10, no. 2–3, pp. 146–162, Aug. 1954. [29] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” ArXiv13104546 Cs Stat, Oct. 2013. [30] O. Levy and Y. Goldberg, “Neural Word Embedding as Implicit Matrix Factorization,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2177–2185. [31] E. M. Ponti, I. Vulić, G. Glavaš, N. Mrkšić, and A. Korhonen, “Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization,” ArXiv180904163 Cs, Sep. 2018. [32] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller, “Introduction to WordNet: An On-line Lexical Database *,” Int. J. Lexicogr., vol. 3, no. 4, pp. 235–244, 1990. [33] C. Baker, “FrameNet: A Knowledge Base for Natural Language Processing,” in Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014), Baltimore, MD, USA, 2014, pp. 1–5. [34] J. Ganitkevitch, B. Van Durme, and C. Callison-Burch, “PPDB: The Paraphrase Database,” in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, 2013, pp. 758–764. [35] W. Yih, G. Zweig, and J. C. Platt, “Polarity Inducing Latent Semantic Analysis,” in EMNLP-CoNLL, 2012. [36] J. Guo, W. Che, D. Yarowsky, H. Wang, and T. Liu, “Cross-lingual Dependency Parsing Based on Distributed Representations,” in ACL, 2015. [37] M. Ono, M. Miwa, and Y. Sasaki, “Word Embedding-based Antonym Detection using Thesauri and Distributional Information,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, May. [38] N. T. Pham, A. Lazaridou, and M. Baroni, “A Multitask Objective to Inject Lexical Contrast into Distributional Semantics,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China, 2015, pp. 21–26. [39] S. Rothe and H. Schütze, “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 2015, pp. 1793–1803. [40] M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. Hovy, and N. A. Smith, “Retrofitting Word Vectors to Semantic Lexicons,” ArXiv14114166 Cs, Nov. 2014. [41] J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, “From Paraphrase Database to Compositional Paraphrase Model and Back,” Trans. Assoc. Comput. Linguist., vol. 3, pp. 345–358, 2015. [42] N. Mrkšić et al., “Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints,” Trans. Assoc. Comput. Linguist., vol. 5, pp. 309–324, Dec. 2017. [43] T. Mikolov, Q. V. Le, and I. Sutskever, “Exploiting Similarities among Languages for Machine Translation,” ArXiv13094168 Cs, Sep. 2013. [44] M. Abdalla and G. Hirst, “Cross-Lingual Sentiment Analysis Without (Good) Translation,” p. 10. [45] C. Xing, D. Wang, C. Liu, and Y. Lin, “Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, May. [46] I. Vulić, G. Glavaš, N. Mrkšić, and A. Korhonen, “Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources,” ArXiv180503228 Cs, May 2018. [47] A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou, “Word Translation Without Parallel Data,” ArXiv171004087 Cs, Oct. 2017. [48] G. Lample, A. Conneau, L. Denoyer, and M. Ranzato, “Unsupervised Machine Translation Using Monolingual Corpora Only,” ArXiv171100043 Cs, Oct. 2017.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0727119-153535.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS