博碩士論文 etd-0819114-155606 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 江佳哲(Chia-Che Chiang ) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Information Management)
畢業學位 碩士(Master) 畢業時期 103學年第1學期
論文名稱(中) 利用文字探勘技術萃取旅館評價文章之研究
論文名稱(英) Use Text Mining Techniques to Identify Noteworthy Hotel Reviews from Travel Forums
檔案
  • etd-0819114-155606.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    紙本論文:1 年後公開 (2015-09-22 公開)

    電子論文:使用者自訂權限:校內 1 年後、校外 1 年後公開

    論文語文/頁數 英文/52
    統計 本論文已被瀏覽 5365 次,被下載 1129 次
    摘要(中) 用戶生成內容(User-generated content ,UGC)是由使用者自行創建的內容,而由此特性建置的網站,近年快速增加,使用者生成內容的數量也因此不斷的擴增。而我們的研究以著名的旅遊網站TripAdvisor.com為例,該網站的內容由使用者自行建置,在TripAdvisor.com中使用者共同分享旅遊經驗,包含景點、旅館、餐廳等,像這樣使用者生成內容的旅遊網站重點在於使用者真實體驗後的感受與經驗的分享,而其他使用者可依循先前使用者的經驗了解更加了解飯店、餐廳。而這樣的資訊不單旅遊者會瀏覽,相關旅遊產業的工作人員也十分關注。
      TripAdvisor.com評論內容數量龐大且雜亂,對想了解自家飯店在TripAdvisor網站價評管理人員而言是十分龐大的負擔。針對此問題,我們的研究提供給旅館管理者快速、準確且值得被關注評論內容,其值得被關注的評論由旅館從業人員提供,經由訪談、內容分析,將其值得被關注評論內容區分為幾個特性,包含內容特徵、文字情緒特徵以及文章品質。利用這些特性與文字探勘技術共建置的分類模型,經由我們的研究證實,內容特徵具有最大的影響,其次是情緒特徵以及文章品質。其結果可提供給旅館管理者,做為內容管理和針對值得被關注的評論回覆,能提高網路評價並增加自家飯店的能見度,並在競爭激烈的旅遊行業取得成功的關鍵。
    摘要(英) The advance of user-generated content (UGC) inspires knowledge sharing among Internet users. A good example is the well-known travel site TripAdvisor.com, which enables users to share their experiences and express their opinions on attractions, accommodations, restaurants, etc. The UGC about travel provide precious information to the users as well as staff in travel industry. In particular, how to find reviews that are noteworthy to hotel is critical to the success of hotels in the competitive travel industry.
    We have employed two hotel managers to conduct a preliminary examination on the hotel reviews of Tripadvisor.com and found noteworthy reviews can be characterized by their content features, sentiment features, and quality. Through the experiments using tripadvisor.com data, we found that all the features are important in identifying noteworthy hotel reviews. Specifically, content features have been shown to have most impact, followed by sentiment and quality.
    關鍵字(中)
  • 文件分類
  • 詞彙網路
  • 潛在狄氏分配
  • 支持向量機器
  • 多義詞歧義消解
  • 關鍵字(英)
  • Latent Dirichlet allocation
  • Text classification
  • Word-sense disambiguation
  • SVM
  • WordNet
  • 論文目次 CHAPTER 1- Introduction 1
    1.1 Background 1
    1.2 Motivation 2
    CHAPTER 2- Literature Review 4
    2.1 Content Feature Identification 4
    2.2 Polarity Recognition 9
    Emotion Identification 10
    Negation and Quantifiers 11
    2.3 Quality of Product Review 16
    2.4 Recommended review in tourism domain 17
    CHAPTER 3- Problem Definition 19
    3.1 Noteworthy Reviews 19
    3.2 Research Problem Definition 21
    CHAPTER 4- The Approach 23
    4.1 Topics Extraction 23
    4.2 Sentiment Detection 27
    4.3 Quality of Review Measure 29
    4.4 Classification Model Construction 30
    CHAPTER 5- Evaluation 33
    5.1 Tripadvisor web crawler 33
    5.2 Select 500 reviews for experts labeling class 34
    5.3 Selection attribute from LDA 35
    5.4 Performance Results 37
    CHAPTER 6- Conclusions 43
    References 44
    參考文獻 References
    Agirre, E. and A. Soroa (2009). Personalizing pagerank for word sense disambiguation. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics.

    Baccianella, S., et al. (2010). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings of the International Conference on Language Resources and Evaluation.

    Carrillo de Albornoz, J., et al. (2010). A hybrid approach to emotional sentence polarity and intensity classification. Proceedings of the Fourteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics.

    Cheung, M. Y., et al. (2009). "Credibility of electronic word-of-mouth: Informational and normative determinants of on-line consumer recommendations." International Journal of Electronic Commerce 13(4): 9-38.

    Chorus, C. G., et al. (2006). "Travel information as an instrument to change car-drivers’ travel choices: a literature review." European Journal of Transport and Infrastructure Research 6(4): 335-364.

    Councill, I. G., et al. (2010). What's great and what's not: learning to classify the scope of negation for improved sentiment analysis. Proceedings of the workshop on negation and speculation in natural language processing, Association for Computational Linguistics.

    de Albornoz, J. C., et al. (2012). UCM-I: a rule-based syntactic approach for resolving the scope of negation. Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Association for Computational Linguistics.

    de Albornoz, J. C., et al. (2011). A joint model of feature mining and sentiment analysis for product review rating. Advances in information retrieval, Springer: 55-66.

    Dıaz, A., et al. "UCM at TREC-2012: Does negation influence the retrieval of medical reports?" Proceedings of the Text REtrieval Conference.

    Esuli, A. and F. Sebastiani (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of the International Conference on Language Resources and Evaluation.

    Han, J., et al. (2006). Data mining: concepts and techniques, Morgan kaufmann.

    Klein, D. and C. D. Manning (2003). Accurate unlexicalized parsing. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics.

    Liu, J., et al. (2007). Low-Quality Product Review Detection in Opinion Summarization. Proceedings of the Conference on Empirical Methods in Natural Language Processing.

    Morante, R. (2010). "Descriptive analysis of negation cues in biomedical texts." Proceedings of the International Conference on Language Resources and Evaluation.

    Navigli, R. (2009). "Word sense disambiguation: A survey." ACM Computing Surveys (CSUR) 41(2): 1-10.

    O'Mahony, M. P. and B. Smyth (2009). Learning to recommend helpful hotel reviews. Proceedings of the third ACM conference on Recommender systems, ACM.

    O’Mahony, M. P. and B. Smyth (2010). "A classification-based review recommender." Knowledge-Based Systems 23(4): 323-329.

    Pang, B. and L. Lee (2008). "Opinion mining and sentiment analysis." Foundations and trends in information retrieval 2(1-2): 1-135.

    Salton, G. and M. J. McGill (1986). "Introduction to modern information retrieval."
    口試委員
  • 李錫智 - 召集委員
  • 林耕霈 - 委員
  • 鄭滄祥 - 委員
  • 黃三益 - 指導教授
  • 口試日期 2013-07-31 繳交日期 2014-09-22

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫