Using Deep Learning Method Based on Short Text Classification to Track Emotions of Conversations
deep learning, sentiment analysis, machine learning, tracking emotion, dialogue
文本情感分析是對文本中的某段已知文字進行情緒分類。互動式情感分析是一個非常具有挑戰性的領域。而有鑑於情緒在人際互動中的重要性,目前已有許多在電腦中探測人類對話情緒的實驗,但沒有一個大眾普遍都同意的情緒定義和類型,以及人們對於文字情緒的理解也都不盡相同,所以這是一個非常困難的議題。因此,本研究將以康乃爾大學所提供的Movie Dialog Corpus使用深度學習模型透過談話情境,追蹤連續對話的情緒。
實驗發現如果有加入類別權重,在多類別分類中,預測類別不會集中於數量較多的類別。而利用語意規則,Convolutional LSTM在各類別或者整體的預測正確的狀況,皆明顯比沒有使用語意規則還要好。最後,對話組的句子經過權重相乘相加,正確預測的能力隨著資訊量越多而越強,說明了可有效遞延對話的情緒以及加強預測能力。
Text sentiment analysis is the emotion classification of a particular piece of text. Interactive sentiment analysis is a very challenging research field. Given the importance of emotions in interpersonal interactions, there were many experimental studies in which emotions were detected by computers. However, there is no commonly agreed emotion definition and emotion types, and people's comprehensions on text emotions are different. In this study, we use the deep learning model and Movie Dialog Corpus provided by Cornell University to track the emotions of continuous dialogue through context.
This study aims to verify the effect of emotion aggregation and propagation in conversations through the emotion analysis of a single sentence and the use of weights for multiple sentences. Additionally, the effects of category weights, semantic rules and different machine learning methods on classification are discussed in along with the process of building models.
The experimental results shows that in our multi-class classification problem, if class weights are used, the predicted results will not be deviated toward the classes with large numbers of data. Also, the cases of using Convolutional LSTM with semantic rules are significantly better than those without semantic rules. Finally, the emotions of the consecutive sentences of in the dialogue are weighted and accumulated in prediction. The results indicate that the emotions can be propagated through time and thus effectively enhance the classification performance.
目次 Table of Contents
論文審定書 i
誌謝 ii
摘要 iii
Abstract iv
第一章、緒論 1
1.1研究背景 1
1.2研究動機 1
1.3研究目的 2
1.4研究架構 3
第二章、文獻探討 4
2.1情感分析 4
2.1.1單向文本情感分析 4
2.1.2雙向文本情感分析 6
2.2 深度學習 8
2.2.1循環神經網路(Recurrent Neural Networks, RNNs) 8
2.2.2 長短期記憶(Long Short Term Memory Network, LSTM) 10
第三章、研究方法 12
3.1 資料蒐集以及格式定義 13
3.2 資料前處理 15
3.3 類別平衡 19
3.3.1 類別權重 20
3.3.2 以二元分類驗證類別平衡之效益 21
3.4 分類模型建立 22
3.5 對話 25
第四章、研究結果 27
4.1評估準則 27
4.2二元分類結果分析 29
4.3各模型比較 36
4.3.1 類別平衡的影響 39
4.3.2 語意規則的影響 42
4.3.3 同時沒有類別平衡與語意規則 45
4.4 形成對話的影響 48
4.4.1 單句對話 48
4.4.2多句對話 52
4.5 與Watson系統之比較 54
4.6 綜合討論 55
第五章、結論 56
5.1結論 56
5.2未來研究建議 57
