博碩士論文 etd-0727119-153535 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 吳明倫(MING-LUN WU) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Department of Information Management)
畢業學位 碩士(Master) 畢業時期 107學年第2學期
論文名稱(中) 建構跨語言情緒詞典之框架研究
論文名稱(英) A Framework to Cross-lingual Sentiment Lexicon Construction
檔案
  • etd-0727119-153535.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    紙本論文:立即公開

    電子論文:校內校外完全公開

    論文語文/頁數 英文/42
    統計 本論文已被瀏覽 5357 次,被下載 13 次
    摘要(中) 使用情緒詞典進行情緒分析是相當常見的方式,然而情緒辭典的品質在不同語言間有極大的落差,常見的語言如英文擁有豐富、由專家所標定的辭典,而其他大部分語言則常須依靠人工或機器翻譯的方式取得辭典。建置跨語言情緒辭典的目的在於借用主流語言的語意資源,來協助建置其他非主流語言的情緒辭典。
    本論文提出一個建置跨語言情緒辭典的框架,我們在語意空間中加入外部語意資源的資訊,再將不同語言的語意空間投射到相同空間來產生非主流語言的情緒辭典,透過此框架能夠借用主流語言的語意資源來訓練出具有特定領域性質的非主流語言情緒辭典,產生時不需要預先標定情緒標籤的文本,也不需要平行文本來進行跨語言投影。
    我們在實驗中證明透過本框架產生的情緒辭典,比起使用機器翻譯專家標定的辭典,在分類準確率上有更佳的表現,而且在文本中也具有較高的涵蓋率。
    摘要(英) Lexicon-based sentiment analysis is a popular and practical approach for sentiment analysis. However, the quality of sentiment lexicon varies greatly across languages. Some languages such as English are rich in lexicons that are crafted by experts, while most other languages must rely on expensive manual translation or ineffective machine translation to obtain the lexicon. The purpose of cross-lingual lexicon learning is to leverage the language with rich resource to extend the lexicons for the languages with less resource.
    This thesis proposes a framework to perform cross-lingual lexicon learning. We incorporate semantic relation and contextual information in respective vector spaces of both dominant and target languages, and then project the space of both languages into the shared space. Finally, we can query on the shared space to obtain sentiment lexicon in target language. Our framework does not require corpus with sentiment labels, nor does it need parallel corpus for cross-lingual transformation.
    We show in experiments that the sentiment lexicon generated through our framework has better performance in classification accuracy than the lexicon generating using machine translation, and it also has a higher coverage in the corpus.
    關鍵字(中)
  • 跨語言空間投影
  • 情緒分析
  • 跨語言情緒辭典學習
  • 文字分析
  • 語意空間特殊化
  • 關鍵字(英)
  • Cross-Lingual Mapping
  • Sentiment Analysis
  • Cross-Lingual Sentiment Lexicon Learning
  • Text Mining
  • Semantic Specialization
  • 論文目次 TABLE OF CONTENTS
    論文審定書 i
    致謝 ii
    摘要 iii
    ABSTRACT iv
    TABLE OF CONTENTS vi
    LIST OF FIGURES vii
    LIST OF TABLE vii
    CHAPTER 1 – INTRODUCTION 1
    CHAPTER 2 – RELATED WORK 5
    SENTIMENT ANALYSIS 5
    SEMANTIC SPECIALIZATION 8
    CROSS-LINGUAL TRANSFORMATION 10
    CHAPTER 3 – OUR FRAMEWORK 12
    3.1 SEMANTIC SPECIALIZATION 13
    3.1.1 Learning from scratch 13
    3.1.2 Fine-tuning Pre-trained Vectors 17
    3.2 CROSS-LINGUAL MAPPING 20
    Linear Transformation 21
    3.3 NEIGHBORHOOD QUERY 21
    CHAPTER 4 – EXPERIMENTS 22
    4.1 EVALUATION SETTINGS 22
    4.1.1 Binary sentiment classification 22
    4.1.2 Data 23
    4.1.3 Parameter Settings 24
    4.1.4 Evaluation Metrics 25
    4.2 RESULTS 25
    4.2.1 Accuracy 25
    4.2.2 Coverage 28
    CHAPTER 5 – CONCLUSION AND FUTURE WORK 29
    REFERENCES 30
    LIST OF FIGURES
    FIGURE 1 : OUR FRAMEWORK 12
    FIGURE 2: TESTING ACCURACY USING DIFFERENT METHOD COMBINATIONS 26
    FIGURE 3: LEXICON COVERAGE USING DIFFERENT METHOD COMBINATIONS. 28
    List of Table
    TABLE 1: ACCURACY OF BINARY SENTIMENT CLASSIFICATION USING LEXICONS GENERATED BY VARIOUS METHODS. 26
    參考文獻 References
    [1] Y. R. Tausczik and J. W. Pennebaker, “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods,” J. Lang. Soc. Psychol., vol. 29, no. 1, pp. 24–54, Mar. 2010.
    [2] P. J. Stone, D. C. Dunphy, and M. S. Smith, The general inquirer: A computer approach to content analysis. Oxford, England: M.I.T. Press, 1966.
    [3] “CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON - Mohammad - 2013 - Computational Intelligence - Wiley Online Library.” [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8640.2012.00460.x. [Accessed: 05-Jul-2019].
    [4] X. Wan, “Co-training for Cross-lingual Sentiment Classification,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1, Stroudsburg, PA, USA, 2009, pp. 235–243.
    [5] R. Mihalcea, C. Banea, and J. Wiebe, “Learning Multilingual Subjective Language via Cross-Lingual Projections,” in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 2007, pp. 976–983.
    [6] A. Hassan, A. Abu-Jbara, R. Jha, and D. Radev, “Identifying the Semantic Orientation of Foreign Words,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2, Stroudsburg, PA, USA, 2011, pp. 592–597.
    [7] “Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets,” p. 10.
    [8] D. Gao, F. Wei, W. Li, X. Liu, and M. Zhou, “Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation,” Comput. Linguist., vol. 41, no. 1, pp. 21–40, Feb. 2015.
    [9] “(PDF) Sentiment analysis: Capturing favorability using natural language processing.” [Online]. Available: https://www.researchgate.net/publication/220916772_Sentiment_analysis_Capturing_favorability_using_natural_language_processing. [Accessed: 16-Jul-2019].
    [10] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques,” in In IEEE Intl. Conf. on Data Mining (ICDM, 2003, pp. 427–434.
    [11] N. A. Abdulla, N. A. Ahmed, M. A. Shehab, and M. Al-Ayyoub, “Arabic sentiment analysis: Lexicon-based and corpus-based,” in 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), 2013, pp. 1–6.
    [12] M. S. Mubarok, Adiwijaya, and M. D. Aldhi, “Aspect-based sentiment analysis to review products using Naïve Bayes,” presented at the INTERNATIONAL CONFERENCE ON MATHEMATICS: PURE, APPLIED AND COMPUTATION: Empowering Engineering using Mathematics, Surabaya, Indonesia, 2017, p. 020060.
    [13] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques,” in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), 2002, pp. 79–86.
    [14] M. S. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine learning techniques,” in 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 2013, pp. 1–5.
    [15] T. Mullen and N. Collier, “Sentiment Analysis using Support Vector Machines with Diverse Information Sources,” in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004, pp. 412–418.
    [16] A. Ghosh et al., “SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter,” in Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, 2015, pp. 470–478.
    [17] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1746–1751.
    [18] X. Dong and G. de Melo, “Cross-Lingual Propagation for Deep Sentiment Analysis,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    [19] R. Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 2013, pp. 1631–1642.
    [20] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-Based Methods for Sentiment Analysis,” Comput. Linguist., vol. 37, no. 2, pp. 267–307, Jun. 2011.
    [21] P. D. Turney and M. L. Littman, “Measuring praise and criticism: Inference of semantic orientation from association,” ACM Trans. Inf. Syst., vol. 21, no. 4, pp. 315–346, Oct. 2003.
    [22] C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text,” p. 10.
    [23] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” p. 8.
    [24] W. L. Hamilton, K. Clark, J. Leskovec, and D. Jurafsky, “Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora,” in Proceedings of the 2016 Conference on Empirical Methods in Natural      Language Processing, Austin, Texas, 2016, pp. 595–605.
    [25] E. Demirtas and M. Pechenizkiy, “Cross-lingual polarity detection with machine translation,” in Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining - WISDOM ’13, Chicago, Illinois, 2013, pp. 1–8.
    [26] A. Hassan, A. Abu-Jbara, R. Jha, and D. Radev, “Identifying the semantic orientation of foreign words,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, 2011, pp. 592–597.
    [27] J. Turian, L.-A. Ratinov, and Y. Bengio, “Word Representations: A Simple and General Method for Semi-Supervised Learning,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 384–394.
    [28] Z. S. Harris, “Distributional Structure,” WORD, vol. 10, no. 2–3, pp. 146–162, Aug. 1954.
    [29] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” ArXiv13104546 Cs Stat, Oct. 2013.
    [30] O. Levy and Y. Goldberg, “Neural Word Embedding as Implicit Matrix Factorization,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2177–2185.
    [31] E. M. Ponti, I. Vulić, G. Glavaš, N. Mrkšić, and A. Korhonen, “Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization,” ArXiv180904163 Cs, Sep. 2018.
    [32] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller, “Introduction to WordNet: An On-line Lexical Database *,” Int. J. Lexicogr., vol. 3, no. 4, pp. 235–244, 1990.
    [33] C. Baker, “FrameNet: A Knowledge Base for Natural Language Processing,” in Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014), Baltimore, MD, USA, 2014, pp. 1–5.
    [34] J. Ganitkevitch, B. Van Durme, and C. Callison-Burch, “PPDB: The Paraphrase Database,” in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, 2013, pp. 758–764.
    [35] W. Yih, G. Zweig, and J. C. Platt, “Polarity Inducing Latent Semantic Analysis,” in EMNLP-CoNLL, 2012.
    [36] J. Guo, W. Che, D. Yarowsky, H. Wang, and T. Liu, “Cross-lingual Dependency Parsing Based on Distributed Representations,” in ACL, 2015.
    [37] M. Ono, M. Miwa, and Y. Sasaki, “Word Embedding-based Antonym Detection using Thesauri and Distributional Information,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, May.
    [38] N. T. Pham, A. Lazaridou, and M. Baroni, “A Multitask Objective to Inject Lexical Contrast into Distributional Semantics,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China, 2015, pp. 21–26.
    [39] S. Rothe and H. Schütze, “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 2015, pp. 1793–1803.
    [40] M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. Hovy, and N. A. Smith, “Retrofitting Word Vectors to Semantic Lexicons,” ArXiv14114166 Cs, Nov. 2014.
    [41] J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, “From Paraphrase Database to Compositional Paraphrase Model and Back,” Trans. Assoc. Comput. Linguist., vol. 3, pp. 345–358, 2015.
    [42] N. Mrkšić et al., “Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints,” Trans. Assoc. Comput. Linguist., vol. 5, pp. 309–324, Dec. 2017.
    [43] T. Mikolov, Q. V. Le, and I. Sutskever, “Exploiting Similarities among Languages for Machine Translation,” ArXiv13094168 Cs, Sep. 2013.
    [44] M. Abdalla and G. Hirst, “Cross-Lingual Sentiment Analysis Without (Good) Translation,” p. 10.
    [45] C. Xing, D. Wang, C. Liu, and Y. Lin, “Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, May.
    [46] I. Vulić, G. Glavaš, N. Mrkšić, and A. Korhonen, “Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources,” ArXiv180503228 Cs, May 2018.
    [47] A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou, “Word Translation Without Parallel Data,” ArXiv171004087 Cs, Oct. 2017.
    [48] G. Lample, A. Conneau, L. Denoyer, and M. Ranzato, “Unsupervised Machine Translation Using Monolingual Corpora Only,” ArXiv171100043 Cs, Oct. 2017.
    口試委員
  • 康藝晃 - 召集委員
  • 洪澤權 - 委員
  • 簡士鎰 - 委員
  • 黃三益 - 指導教授
  • 口試日期 2019-07-22 繳交日期 2019-08-27

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫