博碩士論文 etd-0721118-153855 詳細資訊

姓名 張嘉真(Chia-Chen Chang) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Information Management)
畢業學位 碩士(Master) 畢業時期 106學年第2學期
論文名稱(中) 自動建置領域中文情緒詞典之研究
論文名稱(英) The Research of Constructing Domain-Specific Chinese Sentiment Lexicon
    論文語文/頁數 英文/72
    摘要(中) 隨著社群媒體的盛行,使用者產生大量的文字資料,如:推文、部落格和評論等,這些文字資料都富含著潛在的情緒,我們可以透過情緒分析來得到人們的感受及意見取向。而近年來,情緒分析常以情緒詞典來當作分析的工具,由於領域的多樣性及領域的先驗知識,使得特定領域的情緒詞典在情緒分析中扮演著相當重要的角色。
    摘要(英) With the booming of social media, users generate a large number of texts, such as tweets, blogs, and comments, which are full of potential sentiment. Sentiment analysis aims to obtain people’s feelings and opinions from textual data. The most popular approach for sentiment analysis is to consult the sentiment lexicon. However, due to the diversity of the domain and the prior knowledge, the domain-specific sentiment lexicon plays an important role in sentiment analysis.
    Chinese sentiment lexicon resources, when compared to their English counterparts, are still limited and mostly for general-purpose. Therefore, this research proposes techniques to construct a domain-specific sentiment lexicon in order to obtain a more accurate sentiment analysis. In this thesis, we analyze 1,294,141 hotel reviews crawled from Booking.com, utilizing the vector space model to obtain the semantic meanings between words, and predicting the sentiment scores of the words. Finally, we combine the context and sentiment information with label propagation method to construct a domain-specific sentiment lexicon automatically in hotel domain. The method we proposed achieves 83% precision.
  • 中文情緒詞典
  • 情緒分析
  • 標籤傳播法
  • 詞向量
  • 文字探勘
  • 關鍵字(英)
  • text mining
  • sentiment analysis
  • Chinese sentiment lexicon
  • word embedding
  • label propagation
  • 論文目次 論文審定書 i
    摘要 ii
    Abstract iii
    Table of Contents iv
    List of Figures vi
    List of Tables viii
    Chapter 1 Introduction 1
    1.1 Research Background 1
    1.2 Research Problem 6
    1.3 Research Motivation 6
    1.4 Research Purpose 7
    1.5 Thesis Organization 7
    Chapter 2 Literature Review 9
    2.1 Lexicon-based Sentiment Analysis 9
    2.2 Adding Sentiment Information to Word 10
    2.3 Expanding Sentiment Words Automatically 12
    Chapter 3 Our Approach 15
    3.1 Overall Process 15
    3.2 Data Collection 16
    3.3 Data Preprocessing 17
    3.3.1 Data Cleaning 17
    3.3.2 Segmentation, tokenization and Part-of-Speech Tagging 18
    3.4 Generating Word Representations 19
    3.5 Building Sentiment Prediction Model 24
    3.6 Label Propagation 27
    3.6.1 Label Propagation Algorithm 27
    3.6.2 Label Propagation in batches 30
    3.6.3 Seed Selection 35
    Chapter 4 Evaluation 36
    4.1 Dataset Construction 36
    4.2 Parameter selection in our approach 37
    4.3 Comparing with Other Methods 42
    4.3.1 comparing methods without label propagation 42
    4.3.2 comparing methods with label propagation 45
    4.4 Uniqueness of our domain-specific sentiment lexicon 50
    4.5 Short discussion in opposite polarity problem 52
    Chapter 5 Conclusion 55
    References 56
    Appendix – Chinese Sentiment Lexicon Extracted from Booking.com 61
    Positive words 61
    Negative words 61
  • 魏志平 - 召集委員
  • 倪文君 - 委員
  • 黃三益 - 指導教授
  • 口試日期 2018-07-23 繳交日期 2018-09-03

