Responsive image
博碩士論文 etd-0330120-114747 詳細資訊
Title page for etd-0330120-114747
A Study of Discovering Security Trends from News Analysis
Year, semester
Number of pages
Advisory Committee
Date of Exam
Date of Submission
event detection, topic model, cluster analysis, CTI, text mining
本論文已被瀏覽 5898 次,被下載 0
The thesis/dissertation has been browsed 5898 times, has been downloaded 0 times.
有鑑於此,本研究提出一套新興資安情資偵測系統(Emerging Security Event Detection,簡稱ESED),自動化蒐集資安新聞,擷取資安事件關鍵字,透過主題模型與分群演算法分析新聞內容,以二階段分群與相似度比對方式偵測新興資安事件。經實驗結果顯示,本研究所提出之自動化新興資安情資偵測系統(ESED)能發現各個資安類別的新興資安事件,並有91.09%的偵測精確率,驗證ESED確實能幫助資安人員快速以及有效的應用威脅情資。
With the growth of the Internet and technology, several online services are developing rapidly, and many kinds of security threats and evolving trends are also emerging. In order to respond to various emerging security trends, many companies and organizations start to collect and analyze threat intelligence from multiple sources, in order to obtain complete information on cyber-attacks. According to the attack methods used by hackers, establish corresponding security protection measures to prevent related malicious activities is necessary.
There are diverse sources of threat intelligence, such as news, social media, and forums, where the news will publish real-time event reports after the security incident happened, using news as a source of threat intelligence can get first-hand security information to prevent possible attacks. However, there are many sources of news reports, manually browsing, collecting, and analyzing are not only time-consuming but also require a lot of resources. Therefore, it is necessary to use automated systems to conduct threat intelligence analysis. In view of this, this paper proposes an Emerging Security Event Detection System (ESED), which automatically collects security news, retrieves security event keywords, and use topic models and clustering algorithm to analyze news and detect emerging security events by two-stage clustering and similarity comparison.
The results of experiment prove that ESED can detect emerging security events in different security categories, with the detection precision rate of 91.09%, confirmed that ESED can truly help security personnel apply threat intelligence quickly and effectively.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
目錄 iv
圖次 vi
表次 vii
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
第二章 文獻探討 4
2.1 威脅情資 4
2.2 文字探勘 5
2.2.1 資料前處理 6
2.2.2 特徵擷取 7
2.3 主題模型 7
2.4 文件分群 10
2.4.1 K-means分群演算法 11
2.4.2 階層式分群演算法(Hierarchical clustering) 11
2.5 事件偵測 12
第三章 研究方法 15
3.1 資料蒐集 19
3.2 文字前處理 19
3.3 特徵擷取 21
3.4 主題分群 22
3.5 事件分群 24
第四章 系統實作與評估 25
4.1 資料來源蒐集 26
4.2 實驗1 第一階段主題分群模組參數選取 28
4.2.1 實驗1-1 歷史主題資料時間區段(H)設定 30
4.2.2 實驗1-2 近期主題資料時間區段(C)設定 30
4.2.3 實驗1-3 最佳H與C之參數組合設定 31
4.2.4 實驗1-4 主題數量NH、 NC設定 32
4.2.5 實驗1-5 參數α、β與疊代次數(D)設定 33
4.3 實驗2 第二階段事件分群模組參數選取 35
4.3.1 實驗2-1 訓練K-means++ 分群演算法 35
4.3.2 實驗2-2 訓練 Hierarchical Clustering分群演算法 36
4.4 實驗3 系統偵測成效 38
4.4.1 實驗3-1 新興事件偵測成效 39
4.4.2 實驗3-2 資安人員系統評估 44
4.5 實驗4 與現有威脅情資報告比較 47
第五章 結論與未來研究 50
參考文獻 51
附錄一 2019-09-01 ~2019-09-04之C主題各事件群集偵測結果 55
附錄二 2018-03-01 ~2018-03-04之C主題各事件群集偵測結果 57
附錄三 2018-03-18 ~2018-08-21之C主題各事件群集偵測結果 59
附錄四 2018-05-21 ~2018-05-24之C主題各事件群集偵測結果 61
附錄五 2019年9月份電子報事件偵測 63

圖2- 1、LDA 字詞-文件-主題關聯示意圖[18] 8
圖2- 2、LDA主題模型圖[17] 9
圖2- 3、階層式分群示意圖 12
圖3- 1、系統架構圖 17
圖3- 2、系統細部流程圖 18
圖3- 3、文件處理流程圖 20
圖4- 1、The Hacker News網站內容存於資料庫範例圖 28
圖4- 2、LDA主題模型參數H之Topic Coherence驗證 30
圖4- 3、LDA主題模型參數C之Topic Coherence驗證 31
圖4- 4、LDA主題模型H之主題數(NH)Coherence驗證 32
圖4- 5、LDA主題模型C之主題數(NC)Coherence驗證 33
圖4- 6、LDA主題模型H參數α和β之Coherence驗證 34
圖4- 7、LDA主題模型C參數α和β之Coherence驗證 34
圖4- 8、K-means++ 群集數量挑選 36
圖4- 9、Hierarchical Clustering分群結果 36
圖4- 10、系統模組應用方法流程圖 38
圖4- 11、Sliding Window示意圖 39

表3- 1、系統參數 15
表4- 1、實驗項目列表 25
表4- 2、新聞資料來源 27
表4- 3、LDA主題模型H與C主題之JSD值 32
表4- 4、主題分群模組參數設定 35
表4- 5、本研究系統參數設定 37
表4- 6、2019-09-01 ~ 2019-09-04新興事件偵測部分結果 38
表4- 7、標記資料內容 40
表4- 8、實驗3-1新興資安事件偵測精確率 40
表4- 9、2018-03-01 ~ 2018-03-04新興事件偵測部分結果 41
表4- 10、2018-03-18 ~ 2018-03-21新興事件偵測部分結果 42
表4- 11、2018-05-21 ~ 2018-05-24新興事件偵測部分結果 43
表4- 12、實驗3-2 資安人員系統評估結果 46
表4- 13、統整性資安事件報導 47
表4- 14、警告式的可能攻擊預告 47
表4- 15、攻擊預告與實際攻擊事件 47
表4- 16、資安相關的社群政策 47
表4- 17、系統漏洞更新 47
表4- 18、實驗4偵測結果 48
表4- 19、社群軟體郵件釣魚事件群集 49
參考文獻 References
[1] SANS. (2016). Threat Intelligence : What It Is, and How to Use It Effectively. Available:
[2] 李宗翰. (2016, June 20, 2018). 企業該如何掌握網路威脅情資,以有效阻擋惡意攻擊. Available:
[3] R. Brown and R. M. J. S. I. F. Lee, "The Evolution of Cyber Threat Intelligence (CTI): 2019 SANS CTI Survey," 2019.
[4] N. Al Moubayed, D. Wall, and A. S. McGough, "Identifying Changes in the Cybersecurity Threat Landscape Using the LDA-Web Topic Modelling Data Search Engine," in International Conference on Human Aspects of Information Security, Privacy, and Trust, 2017, pp. 287-295: Springer.
[5] I. Deliu, "Extracting Cyber Threat Intelligence From Hacker Forums," NTNU, 2017.
[6] I. Deliu, C. Leichter, and K. Franke, "Extracting cyber threat intelligence from hacker forums: Support vector machines versus convolutional neural networks," in Big Data (Big Data), 2017 IEEE International Conference on, 2017, pp. 3648-3656: IEEE.
[7] S.-Y. Huang and H. Chen, "Exploring the online underground marketplaces through topic-based social network and clustering," in Intelligence and Security Informatics (ISI), 2016 IEEE Conference on, 2016, pp. 145-150: IEEE.
[8] Gephi. Available:
[9] S. Samtani, K. Chinn, C. Larson, and H. Chen, "AZSecure Hacker Assets Portal: Cyber threat intelligence and malware analysis," in 2016 IEEE Conference on Intelligence and Security Informatics (ISI), 2016, pp. 19-24: Ieee.
[10] R. Feldman and I. Dagan, "Knowledge Discovery in Textual Databases (KDT)," in KDD, 1995, vol. 95, pp. 112-117.
[11] R. Wirth and J. Hipp, "CRISP-DM: Towards a standard process model for data mining," in Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, 2000, pp. 29-39: Citeseer.
[12] M. Allahyari et al., "A brief survey of text mining: Classification, clustering and extraction techniques," 2017.
[13] C. Silva and B. Ribeiro, "The importance of stop word removal on recall values in text categorization," in Proceedings of the International Joint Conference on Neural Networks, 2003., 2003, vol. 3, pp. 1661-1666: IEEE.
[14] M. F. J. P. Porter, "An algorithm for suffix stripping," vol. 14, no. 3, pp. 130-137, 1980.
[15] T. Liu, S. Liu, Z. Chen, and W.-Y. Ma, "An evaluation on feature selection for text clustering," in Proceedings of the 20th international conference on machine learning (ICML-03), 2003, pp. 488-495.
[16] K. J. J. o. d. Sparck Jones, "A statistical interpretation of term specificity and its application in retrieval," vol. 28, no. 1, pp. 11-21, 1972.
[17] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003.
[18] D. M. J. C. o. t. A. Blei, "Probabilistic topic models," vol. 55, no. 4, pp. 77-84, 2012.
[19] T. Nagai et al., "Understanding Attack Trends from Security Blog Posts Using Guided-topic Model," vol. 27, pp. 802-809, 2019.
[20] S. Samtani, R. Chinn, and H. Chen, "Exploring hacker assets in underground forums," in 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), 2015, pp. 31-36: IEEE.
[21] J. MacQueen, "Some methods for classification and analysis of multivariate observations," in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, vol. 1, no. 14, pp. 281-297: Oakland, CA, USA.
[22] E. Marin, A. Diab, and P. Shakarian, "Product offerings in malicious hacker markets," in 2016 IEEE conference on intelligence and security informatics (ISI), 2016, pp. 187-189: IEEE.
[23] J. H. J. J. o. t. A. s. a. Ward Jr, "Hierarchical grouping to optimize an objective function," vol. 58, no. 301, pp. 236-244, 1963.
[24] A. Rege et al., "Using a real-time cybersecurity exercise case study to understand temporal characteristics of cyberattacks," in International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, 2017, pp. 127-132: Springer.
[25] J. Allan, R. Papka, and V. Lavrenko, "On-line new event detection and tracking," in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 1998, pp. 37-45.
[26] Y. Yang et al., "Learning approaches for detecting and tracking news events," vol. 14, no. 4, pp. 32-43, 1999.
[27] L. Hu, B. Zhang, L. Hou, and J. J. K.-B. S. Li, "Adaptive online event detection in news streams," vol. 138, pp. 105-112, 2017.
[28] W. Ai, K. Li, and K. J. A. S. C. Li, "An effective hot topic detection method for microblog on spark," vol. 70, pp. 1010-1023, 2018.
[29] I. Mele and F. Crestani, "Event detection for heterogeneous news streams," in International Conference on Applications of Natural Language to Information Systems, 2017, pp. 110-123: Springer.
[30] Google 10000 most common words. Available:
[31] I. Moutidis and H. T. Williams, "Utilizing Complex Networks for Event Detection in Heterogeneous High-Volume News Streams," in International Conference on Complex Networks and Their Applications, 2019, pp. 659-672: Springer.
[32] Beautiful Soup. Available:
[33] Z. S. J. W. Harris, "Distributional structure," vol. 10, no. 2-3, pp. 146-162, 1954.
[34] M. Röder, A. Both, and A. Hinneburg, "Exploring the space of topic coherence measures," in Proceedings of the eighth ACM international conference on Web search and data mining, 2015, pp. 399-408.
[35] J. J. I. T. o. I. t. Lin, "Divergence measures based on the Shannon entropy," vol. 37, no. 1, pp. 145-151, 1991.
[36] G. Maskeri, S. Sarkar, and K. Heafield, "Mining business topics in source code using latent dirichlet allocation," in Proceedings of the 1st India software engineering conference, 2008, pp. 113-120: ACM.
[37] T. L. Griffiths and M. Steyvers, "Finding scientific topics," Proceedings of the National academy of Sciences, vol. 101, no. suppl 1, pp. 5228-5235, 2004.
[38] The Week in Ransomware - November 22nd 2019 - Leaky Files. Available:
[39] Security Affairs newsletter Round 238. Available:
[40] FBI Warns of Cyber Attacks Targeting US Automotive Industry. Available:
[41] U.S. Government Issues Warning About Possible Iranian Cyberattacks. Available:
[42] Iranian hackers deface US government & African bank website. Available:
[43] YouTube to treat all kid-aimed videos like they’re COPPA-liable. Available:
[44] Facebook bans deepfakes, but not cheapfakes or shallowfakes. Available:
[45] Microsoft Releases January 2020 Office Updates With Crash Fixes. Available:
[46] Tails 4.2 Fixes Numerous Security Flaws, Improves Direct Upgrades. Available:
[47] Adobe Releases First 2020 Patch Tuesday Software Updates. Available:
[48] TWCERT 電子報. Available:
電子全文 Fulltext
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2025-04-30
校外 Off-campus:開放下載的時間 available 2025-04-30

您的 IP(校外) 位址是
現在時間是 2024-04-22
論文校外開放下載的時間是 2025-04-30

Your IP address is
The current date is 2024-04-22
This thesis will be available to you on 2025-04-30.

紙本論文 Printed copies
開放時間 available 2025-04-30

QR Code