||Chatting with machines is not only possible but also more and more common in our lives these days. With the approach, we can execute commands and obtain companionship and entertainment through interacting with the machines. In the past, most dialogue systems only used existing replies based on the instructions the machines received. However, it is unlikely for people to feel the vitality of the machine. People still regard their chatting partners are computer systems. In order to develop a more realistic dialogue system, this study adopts a deep neural network to train the machines to make more lifestyle-oriented dialogues. People’s common dialogues include not only simple question-answering problems and the answers are not just as short as a noun or a yes-or-no answer. There are also many diverse and interesting responses. Moreover, suitable communicative responses among the conversationalists depend not only on the contents, but also on the environmental contexts. This study develops a dialogue system to take into account the two essential factors, and uses the TV series as the training datasets. These datasets contain not only conversational contents but also video frames which represent the contexts and the situations when the conversations occur.|
To explore the effect of the image context on the utterance in a dialogue, this work uses a deep neural network model, combining a convolutional neural network (which works well in image recognition) with a recurrent neural network (which achieves excellent performance in natural language). It aims to develop a dialogue system to consider both images and utterances, and to find better ways to evaluate the dialogue system’s responses including defining a quantitative measurement and designing a questionnaire to verify the models learnt from the datasets.