博碩士論文 etd-0721118-153855 詳細資訊
The Research of Constructing Domain-Specific Chinese Sentiment Lexicon
text mining, sentiment analysis, Chinese sentiment lexicon, word embedding, label propagation
With the booming of social media, users generate a large number of texts, such as tweets, blogs, and comments, which are full of potential sentiment. Sentiment analysis aims to obtain people’s feelings and opinions from textual data. The most popular approach for sentiment analysis is to consult the sentiment lexicon. However, due to the diversity of the domain and the prior knowledge, the domain-specific sentiment lexicon plays an important role in sentiment analysis.
Chinese sentiment lexicon resources, when compared to their English counterparts, are still limited and mostly for general-purpose. Therefore, this research proposes techniques to construct a domain-specific sentiment lexicon in order to obtain a more accurate sentiment analysis. In this thesis, we analyze 1,294,141 hotel reviews crawled from, utilizing the vector space model to obtain the semantic meanings between words, and predicting the sentiment scores of the words. Finally, we combine the context and sentiment information with label propagation method to construct a domain-specific sentiment lexicon automatically in hotel domain. The method we proposed achieves 83% precision.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
Table of Contents iv
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
1.1 Research Background 1
1.2 Research Problem 6
1.3 Research Motivation 6
1.4 Research Purpose 7
1.5 Thesis Organization 7
Chapter 2 Literature Review 9
2.1 Lexicon-based Sentiment Analysis 9
2.2 Adding Sentiment Information to Word 10
2.3 Expanding Sentiment Words Automatically 12
Chapter 3 Our Approach 15
3.1 Overall Process 15
3.2 Data Collection 16
3.3 Data Preprocessing 17
3.3.1 Data Cleaning 17
3.3.2 Segmentation, tokenization and Part-of-Speech Tagging 18
3.4 Generating Word Representations 19
3.5 Building Sentiment Prediction Model 24
3.6 Label Propagation 27
3.6.1 Label Propagation Algorithm 27
3.6.2 Label Propagation in batches 30
3.6.3 Seed Selection 35
Chapter 4 Evaluation 36
4.1 Dataset Construction 36
4.2 Parameter selection in our approach 37
4.3 Comparing with Other Methods 42
4.3.1 comparing methods without label propagation 42
4.3.2 comparing methods with label propagation 45
4.4 Uniqueness of our domain-specific sentiment lexicon 50
4.5 Short discussion in opposite polarity problem 52
Chapter 5 Conclusion 55
References 56
Appendix – Chinese Sentiment Lexicon Extracted from 61
Positive words 61
Negative words 61
