Responsive image
博碩士論文 etd-0805119-133659 詳細資訊
Title page for etd-0805119-133659
論文名稱
Title
運用自然語言處理工具實作貼文分析系統觀察網路論壇,以Dcard為例
Building A Post Analyzer By Natural Language Processing Tool To Observe Online Forum, Using Dcard As Example
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
82
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2019-07-31
繳交日期
Date of Submission
2019-09-05
關鍵字
Keywords
BERT、網路爬蟲、同質性、Dcard、情緒分析
Web Crawling, Sentiment Analysis, Dcard, Homophily, BERT
統計
Statistics
本論文已被瀏覽 6033 次,被下載 1244
The thesis/dissertation has been browsed 6033 times, has been downloaded 1244 times.
中文摘要
社交平台與公開論壇中的迴聲室效應易造成討論參與者認知偏差,尤其當發生在
政治或時事議題討論時,更會造成極端主義以及社會分化的問題,也容易助長假新聞
之傳播。
本篇論文將以 Dcard 論壇上的時事、明星戲劇與成人內容等三大類貼文作為案
例,透過網路爬蟲、情感分析與語意分析等文字探勘技術,分析貼文與留言,藉由打
造一套人機介面資訊系統,輔助專家判讀對話健康程度與判斷迴聲室效應是否存在於
此。
透過案例探討 Dcard 這個年輕的網路論壇的環境與各個看板的現況。最後請 3 位
常於網路瀏覽貼文或參與討論的網路社群專家體驗本系統並給予回饋來改進本系統。
Abstract
The echo chamber effect exists in social platforms and forums, will cause users stick
into cognitive bias. Especially when it comes to political or trending news, it will also cause
extremism, divide the people and help fake news spreading.
In this paper, we will build a post analyzer by web crawler, and some NLP tool like
sentiment analyzer, BERT model, and semantic cluster. The system will provide assistance,
assist the expert measure whether the conversation is healthy.
Then we will analysis these 3 topics: trending news, celebrities and adult content on
Dcard for case study. Through these cases, we can explore the environment of this fresh
online forum, Dcard, and the status of each sub forum. Finally, we interview 3 experts with
social media domain knowledge to experience the system and share their opinions with us to
improve the system.
目次 Table of Contents
論文審定書 i
摘要 iii
Abstract iv
目錄 v
圖次 vii
表次 ix
第一章 緒論 1
1.1 研究背景與動機 2
1.2 研究目的 5
第二章 文獻探討 7
2.1 同質性觀點 7
2.1.1 過濾氣泡 7
2.1.2 迴聲室效應 8
2.2 文字探勘 8
2.2.1 中文斷詞 9
2.2.2 情緒分析 9
2.2.3 文字向量 11
2.3 對話健康度 12
第三章 研究方法與步驟 14
3.1 系統架構 14
3.2 資料搜集 17
3.2.1 Dcard API 17
3.2.2 資料儲存 18
3.2.3 資料搜集腳本開發 19
3.2.4 選擇看板 22
3.3 資料分析 23
3.3.1 情緒分析 24
3.3.2 語意分群 26
3.4 資料呈現 32
3.5 深度訪談 37
第四章 分析結果 39
4.1 DCARD生態 39
4.2 案例分享 42
4.2.1 時事板 42
4.2.2 追星版 43
4.2.3 西斯版 49
第五章 結論 55
5.1 研究結論 55
5.2 未來展望 57
參考文獻 58
附錄I資料庫綱要 67
附錄II訪談大綱 71
參考文獻 References
[1] S. Kemp, "Global Digital Report 2018," We Are Social Ltd., 2018. [Online]. Available: https://digitalreport.wearesocial.com/. [Accessed 2018].
[2] "Digtal News Report 2016," Reuters Institute, 2016. [Online]. Available: http://www.digitalnewsreport.org/survey/2016/. [Accessed 2018].
[3] L. Lin, "Digital News Report Taiwan 2019," Reuters Institute, 2019. [Online]. Available: http://www.digitalnewsreport.org/survey/2019/taiwan-2019/. [Accessed 2019].
[4] E. Pariser, "Beware Online "Filter Bubbles"," 2011. [Online]. Available: https://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles. [Accessed 2018].
[5] E. Bakshy, S. Messing and L. A. Adamic, "Exposure to ideologically diverse news and opinion on Facebook," Science, pp. 1130-1132, 2015.
[6] 書生百用, “【公共討論的迷思和建議】深受歡迎的哲學家選擇離開社交媒體,” 2018. [線上]. Available: https://chitchitphilosophy.blogspot.com/2018/10/public-conversation-on-soical-media.html.
[7] H. T. Williams, James R. McMurray, T. Kurz and F. H. Lambert, "Network analysis reveals open forums and echo chambers in social media discussions of climate change," Global Environmental Change, pp. 126-138, 2015.
[8] “八成以上台灣人愛用Facebook、Line坐穩社群網站龍頭 1人平均擁4個社群帳號 年輕人更愛YouTube和IG,” 財團法人資訊工業策進會, 2017. [線上]. Available: https://www.iii.org.tw/Press/NewsDtl.aspx?nsp_sqno=1934&fm_sqno=14. [存取日期: 2018].
[9] E. Pariser, The Filter Bubble: What The Internet Is Hiding From You, Tantor Media Inc, 2011.
[10] W.-Y. Ma and K.-J. Chen, "Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff," Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp. 168-171, 2003.
[11] “中文斷詞系統,” 中央研究院詞庫小組, 2003. [線上]. Available: http://ckipsvr.iis.sinica.edu.tw/. [存取日期: 2018].
[12] S. Junyi, "fxsjy/jieba: 结巴中文分词," 2013. [Online]. Available: https://github.com/fxsjy/jieba. [Accessed 2018].
[13] R. Luo, J. Xu, Y. Zhang, X. Ren and X. Sun, "PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation," ArXiv:1906.11455 [cs.CL], 2019.
[14] R. Luo, J. Xu, Y. Zhang, X. Ren 且 X. Sun, “pkuseg:一個多領域中文分詞工具包,” Github, 2019. [線上]. Available: https://github.com/lancopku/pkuseg-python. [存取日期: 2019].
[15] isnowfy, "isnowfy/snownlp: Python library for processing Chinese text," 2013. [Online]. Available: https://github.com/isnowfy/snownlp. [Accessed 2018].
[16] L. Ku and H. Chen, "Mining opinions from the Web: Beyond relevance retrieval," Journal of the American Society for Information Science and Technology, pp. 1838-1850, 2007.
[17] S.-M. Wang and L.-W. Ku, "ANTUSD: A Large Chinese Sentiment Dictionary," Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), p. 2697–2702, 2016.
[18] 黃金蘭, C. K. Chung, N. Hui, 林以正, 謝亦泰, B. C. P. Lam, 程威銓, M. H. Bond 且 J. W. Pennebaker, “中文版語文探索與字詞計算字典之建立,” 中華心理學刊, pp. 185-201, 2012.
[19] P. Hsu, “文本情緒分析,” 2018. [線上]. Available: https://www.facebook.com/notes/uigathering-%E5%8F%B0%E7%81%A3%E4%BD%BF%E7%94%A8%E8%80%85%E7%B6%93%E9%A9%97%E8%A8%AD%E8%A8%88%E5%8D%94%E6%9C%83/%E6%96%87%E6%9C%AC%E6%83%85%E7%B7%92%E5%88%86%E6%9E%90/10156244217783382/. [存取日期: 2018].
[20] "Natural Language," Google, 2018. [Online]. Available: https://cloud.google.com/natural-language. [Accessed 2018].
[21] “百度AI開放平台,” 百度在线网络技术(北京)有限公司, 2019. [線上]. Available: https://ai.baidu.com/. [存取日期: 2019].
[22] “情感分析,” 騰訊科技股份有限公司, 2018. [線上]. Available: https://ai.qq.com/product/nlpemo.shtml. [存取日期: 2018].
[23] “玻森中文語義開放平台,” 上海市玻森数据科技有限公司, 2019. [線上]. Available: https://bosonnlp.com/. [存取日期: 2019].
[24] T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv:1301.3781 [cs.CL], 2013.
[25] T. Mikolov, "word2vec," 2013. [Online]. Available: https://code.google.com/archive/p/word2vec/. [Accessed 2018].
[26] J. Pennington, R. Socher and C. D. Manning, "GloVe: Global Vectors for Word Representation," Empirical Methods in Natural Language Processing (EMNLP), pp. 1532-1543, 2014.
[27] J. Pennington, R. Socher and C. D. Manning, "GloVe: Global Vectors for Word Representation," 2014. [Online]. Available: https://nlp.stanford.edu/projects/glove/.
[28] P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, "Enriching Word Vectors with Subword Information," Transactions of the Association for Computational Linguistics, pp. 135-146, 2017.
[29] "facebookresearch/fastText: Library for fast text representation and classification.," Facebook, 2016. [Online]. Available: https://github.com/facebookresearch/fastText. [Accessed 2018].
[30] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer, "Deep contextualized word representations," arXiv:1802.05365 [cs.CL], 2018.
[31] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer, "allenai/allennlp: ELMo: Deep contextualized word representations," 2018. [Online]. Available: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md. [Accessed 2018].
[32] A. Radford, K. Narasimhan, T. Salimans and I. Sutskever, "Improving Language Understanding by Generative Pre-Training," 2018.
[33] "openai/finetune-transformer-lm: Code and model for the paper "Improving Language Understanding by Generative Pre-Training"," OpenAI, 2018. [Online]. Available: https://github.com/openai/finetune-transformer-lm. [Accessed 2018].
[34] J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv:1810.04805 [cs.CL], 2018.
[35] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei and I. Sutskever, "Language Models are Unsupervised Multitask Learners," 2019.
[36] "openai/gpt-2: Code for the paper "Language Models are Unsupervised Multitask Learners"," OpenAI, 2019. [Online]. Available: https://github.com/openai/gpt-2. [Accessed 2019].
[37] 張俊林, “從Word Embedding到Bert模型—自然語言處理預訓練技術發展史,” 2018. [联机]. Available: https://www.jiqizhixin.com/articles/2018-12-10-8. [访问日期: 2018].
[38] J. Dorsey, "How Twitter needs to change," TED, 2019. [Online]. Available: https://www.ted.com/talks/jack_dorsey_how_twitter_needs_to_change. [Accessed 2019].
[39] C. Newton, "THE TRAUMA FLOOR The secret lives of Facebook moderators in America," 2019. [Online]. Available: https://www.theverge.com/2019/2/25/18229714/cognizant-facebook-content-moderator-interviews-trauma-working-conditions-arizona. [Accessed 2019].
[40] “MEASURING THE HEALTH OF OUR PUBLIC CONVERSATIONS,” Cortico, 2018. [線上]. Available: https://www.cortico.ai/blog/2018/2/29/public-sphere-health-indicators. [存取日期: 2018].
[41] J. Dorsey, "Twitter @jack," 2018. [Online]. Available: https://twitter.com/jack/status/969234275420655616. [Accessed 2019].
[42] V. Gadde and D. Gasca, "Measuring healthy conversation," Twitter, 2018. [Online]. Available: https://blog.twitter.com/official/en_us/topics/company/2018/measuring_healthy_conversation.html. [Accessed 2019].
[43] R. Tromble, “POLITICAL COMMUNICATION,” 2018. [線上]. Available: https://www.rebekahtromble.net/political-communication. [存取日期: 2019].
[44] O. Christ, K. Schmid, S. Lolliot, H. Swart, D. Stolle, N. Tausch, A. A. Ramiah, U. Wagner, S. Vertovec and M. Hewstone, "Contextual effect of positive intergroup contact on outgroup prejudice," Proceedings of the National Academy of Sciences of the United States of America, pp. 3996-4000, 2014.
[45] S. Schumann, O. Klein, K. Douglas 且 M. Hewstone, “When is computer-mediated intergroup contact most promising? Examining the effect of out-group members' anonymity on prejudice,” Computers in Human Behavior, pp. 198-210, 2017.
[46] M. Perham, "Sidekiq: Simple, efficient background processing for Ruby.," Contributed Systems LLC., 2019. [Online]. Available: https://sidekiq.org. [Accessed 2019].
[47] M. Perham, "mperham/sidekiq: Simple, efficient background processing for Ruby," Contributed Systems LLC, 2012. [Online]. Available: https://github.com/mperham/sidekiq. [Accessed 2018].
[48] 楊竑昕, “在 Ubuntu 18.04 為 GTX 1060 6G 安裝驅動程式,” 2019. [線上]. Available: https://medium.com/yang-hong-xin/%E5%9C%A8-ubuntu-18-04-%E7%82%BA-gtx-1060-6g-%E5%AE%89%E8%A3%9D%E9%A9%85%E5%8B%95%E7%A8%8B%E5%BC%8F-e070bfdc139e. [存取日期: 2019].
[49] "PostgreSQL: Linux downloads (Ubuntu)," The PostgreSQL Global Development Group, 1996. [Online]. Available: https://www.postgresql.org/download/linux/ubuntu/. [Accessed 2018].
[50] T. Arcieri, E. Michaels-Ober, A. V. Zapparov and Z. Anker, "httprb/http: HTTP (The Gem! a.k.a. http.rb) - a fast Ruby HTTP client with a chainable API, streaming support, and timeouts," 2011. [Online]. Available: https://github.com/httprb/http. [Accessed 2018].
[51] D. H. Hansson, "rails/rails: Active Record – Object-relational mapping in Rails," 2004. [Online]. Available: https://github.com/rails/rails/tree/master/activerecord. [Accessed 2018].
[52] 楊竑昕, “手把手帶你使用 baidu-aip 的情緒分析服務,” 2019. [線上]. Available: https://medium.com/yang-hong-xin/%E6%89%8B%E6%8A%8A%E6%89%8B%E5%B8%B6%E4%BD%A0%E4%BD%BF%E7%94%A8-baidu-aip-%E7%9A%84%E6%83%85%E7%B7%92%E5%88%86%E6%9E%90%E6%9C%8D%E5%8B%99-d6d66ca0d909. [存取日期: 2019].
[53] H. Xiao, "hanxiao/bert-as-service: Mapping a variable-length sentence to a fixed-length vector using BERT model," 2018. [Online]. Available: https://github.com/hanxiao/bert-as-service. [Accessed 2018].
[54] J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "google-research/bert: TensorFlow code and pre-trained models for BERT," 2018. [Online]. Available: https://github.com/google-research/bert. [Accessed 2018].
[55] G. Buesing, "gbuesing/kmeans-clusterer: kmeans-clusterer: k-means clustering in Ruby," 2015. [Online]. Available: https://github.com/gbuesing/kmeans-clusterer. [Accessed 2018].
[56] J. Lee, “Partitional Clustering 切割式分群 | Kmeans, Kmedoid | Clustering 資料分群,” 2018. [線上]. Available: https://www.jamleecute.com/partitional-clustering-kmeans-kmedoid/. [存取日期: 2018].
[57] skydome20, “R筆記–(9)分群分析(Clustering),” 2016. [線上]. Available: http://rpubs.com/skydome20/R-Note9-Clustering. [存取日期: 2018].
[58] V. Asturiano, "vasturiano/3d-force-graph: 3D force-directed graph component using ThreeJS/WebGL," 2017. [Online]. Available: https://github.com/vasturiano/3d-force-graph. [Accessed 2018].
[59] "nhn/tui.chart: Beautiful chart for data visualization," NHN Corp., 2015. [Online]. Available: https://github.com/nhn/tui.chart. [Accessed 2018].
[60] 王金永, 質化研究與社會工作, 洪葉文化, 2005.
[61] 陳向明, 社會科學質的研究, 五南, 2002.
[62] S. Plous, The Psychology of Judgment and Decision Making, 1993.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code