國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,應用資訊檢索提取網路威脅情資,Extracting Cyber Threat Intelligence by Using Information Retrieval

論文名稱 Title	應用資訊檢索提取網路威脅情資 Extracting Cyber Threat Intelligence by Using Information Retrieval
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	109 學年度第 1 學期 The fall semester of Academic Year 109	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	75
研究生 Author	甘景昀 Jing-Yun Kan
指導教授 Advisor	陳嘉玫 Chia-Mei Chen
召集委員 Convenor	賴谷鑫 Gu-Hsin Lai
口試委員 Advisory Committee	江明朝, 林耕霈, 歐雅惠 Ming-Chao Chiang; Keng-Pei Lin; Ya-Hui Ou
口試日期 Date of Exam	2020-09-04	繳交日期 Date of Submission	2020-11-22
關鍵字 Keywords	APT事件、網路威脅情資、自然語言處理、資料檢索、詞向量 NLP, Word Vector, Information Retrieval, APT, CTI
統計 Statistics	本論文已被瀏覽 497 次，被下載 0 次 The thesis/dissertation has been browsed 497 times, has been downloaded 0 times.

中文摘要
資通科技在硬體與軟體上的快速進步，帶給組織與個人更好的生活品質。但是，伴隨而來許多資安的風險與威脅，加上近年來APT (Advanced Persistent Threat，簡稱APT)的興起，越來越多針對特定組織進行一系列複雜且多方位的攻擊。因此，若能利用網路威脅情資（Cyber Threat Intelligence，簡稱CTI），即時掌握各種威脅行為，使攻擊事件從過去的事後偵測與分析，轉變為事件發生前的預防與部屬，才能應對越來越多的APT攻擊。隨著資安意識的抬頭，多樣化的資料來源和開源社群的高速發展，使得網路威脅情資漸漸成為大數據的問題。若僅依靠傳統的人工進行分析，將耗費大量的時間與資源。然而駭客組織發起的APT攻擊活動，不能視為單一的威脅行為。在每一次入侵的過程中，往往需要以不同的威脅手法來達到不同的目標，而藉由搜集威脅情資中的TTP (Tactics Techniques and Procedures)，能使組織快速偵測、應對，使防禦從被動變為主動。有鑑於此，開發一套自動化威脅行為擷取系統，來即時獲取威脅行為，是有其必要性。因此，本研究提出名為「TAminer」（Threat Action Miner）威脅行為檢索系統，收集大量的APT報告和資安新聞，透過自然語言處理（Natural Language Processing，NLP）、神經網路和詞向量技術，自動化提取網路威脅情資中的威脅行為。實驗結果顯示，TAminer擁有94.7%的精確度與95.8%的召回率，進一步證實TAminer能提供資安人員在短時間內，從網路威脅情資中自動化提取有效的威脅行為。
Abstract
The rapid progress of ICT(Information Communication Technology) in hardware and software brings better quality of life to organizations and individuals. However, there are many risks and threats to information security, especially with the rise of more APT(Advanced Persistent Threat)incidents in recent years. More complex and diverse attacks have been carried out against specific organizations. Therefore, if we can use Cyber Threat Intelligence (CTI) to grasp all kinds of threat actions in real time and proactively adjust security measures, we can deal with more and more APT attacks. Because of diverse sources of threat intelligence, such as news, reports, social media, and forums, CTI has gradually become a problem of big data. If we only rely on traditional manual analysis to CTI, it will cost a lot of time and resources. APT attacks cannot be regarded as a single threat behavior. In the process of each invasion, different techniques and threat behavior are used to achieve different goals. Therefore, by collecting TTP(Tactics Techniques and Procedures) in CTI, the organization can quickly detect and respond, and turn the defense from passive to active. In view of this, it is necessary to develop an automated threat behavior retrieval system to obtain threat behavior in real time. Thus, this research proposes the system called "TAminer" (Threat Action Miner), which collects a large number of APT reports and news, uses Natural Language Processing (NLP), and word vector technologies to automatically extract threat actions from CTI. Experimental results show that TAminer has an accuracy of 94.7% and a recall rate of 95.8%. It is proved that TAminer can provide automatically extract effective threat actions from CTI in a short time.

目次 Table of Contents
論文審定書.....................................................................................................................i 摘要................................................................................................................................ii Abstract........................................................................................................................ iii 目錄...............................................................................................................................iv 圖次...............................................................................................................................vi 表次..............................................................................................................................vii 第一章緒論............................................................................................................1 1.1 研究背景....................................................................................................1 1.2 研究動機....................................................................................................3 第二章文獻探討....................................................................................................6 2.1 背景相關研究............................................................................................6 2.2 網路威脅情資............................................................................................9 2.3 進階持續威脅..........................................................................................10 2.4 文字探勘..................................................................................................12 2.5 威脅行為擷取..........................................................................................20 第三章研究方法..................................................................................................22 3.1 資料蒐集..................................................................................................25 3.2 資料清洗..................................................................................................26 3.3 候選威脅行為提取模組..........................................................................27 3.4 關鍵威脅行為提取模組..........................................................................30 3.5 相似度演算法訓練模組..........................................................................32 3.6 威脅行為檢索模組..................................................................................32 第四章系統評估..................................................................................................36 4.1 實驗一、威脅行為提取模組的參數比較與篩選..................................42 4.2 實驗二、評估威脅行為過濾方法..........................................................43 4.3 實驗三、評估威脅行為過濾方法與匹配方法......................................48 4.4 實驗四、威脅行為相關論文比較..........................................................53 4.5 實驗五、真實世界資料評估..................................................................54 第五章研究貢獻與未來展望..............................................................................56 參考文獻......................................................................................................................58 附錄一..........................................................................................................................63

參考文獻 References
參考文獻 [1] YOROI. "The North Korean Kimsuky APT keeps threatening South Korea evolving its TTPs " https://yoroi.company/research/the-north-korean-kimsuky-apt-keeps-threatening-south-korea-evolving-its-ttps/ (accessed: Aug. 3, 2020). [2] I. C. Palli. "TA505 APT Group Returns With New Techniques: Report." https://www.databreachtoday.com/ta505-apt-group-returns-new-techniques-report-a-13678 (accessed: July. 2, 2020). [3] P. Paganini. "Mitsubishi Electric discloses data breach, media blame China-linked APT." https://securityaffairs.co/wordpress/96636/data-breach/mitsubishi-electric-data-breach.html (accessed: July. 2, 2020). [4] L. Harding. "What we know about Russia's interference in the US election." https://www.theguardian.com/us-news/2016/dec/16/qa-russian-hackers-vladimir-putin-donald-trump-us-presidential-election (accessed: June. 10, 2020). [5] R. Brown, "The Evolution of Cyber Threat Intelligence (CTI): 2019 SANS CTI Survey," 2019. [6] "The Value of Threat Intelligence: Annual Study of North American & United Kingdom Companies," 2019. [7] N. MacDonald. "Information Security Is Becoming a Big Data Analytics Problem." https://www.gartner.com/en/documents/1960615/information-security-is-becoming-a-big-data-analytics-pr (accessed: July. 2, 2020). [8] G. Husari, E. Al-Shaer, M. Ahmed, B. Chu, and X. Niu, "TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources," presented at the Proceedings of the 33rd Annual Computer Security Applications Conference, 2017. [9] C. Sabottke, O. Suciu, and T. Dumitraș, "Vulnerability disclosure in the age of social media: Exploiting twitter for predicting real-world exploits," in 24th {USENIX} Security Symposium ({USENIX} Security 15), 2015, pp. 1041-1056. [10] A. S. Gautam, Y. Gahlot, and P. Kamat, "Hacker Forum Exploit and Classification for Proactive Cyber Threat Intelligence," in International Conference on Inventive Computation Technologies, 2019: Springer, pp. 279-285. [11] 魏俐嘉, "基於擷取關鍵字與入侵指標元素區分情報文章與技術文章," 2017. [12] N. Dionísio, F. Alves, P. M. Ferreira, and A. Bessani, "Cyberthreat detection from twitter using deep neural networks," in 2019 International Joint Conference on Neural Networks (IJCNN), 2019: IEEE, pp. 1-8. [13] Z. Zhu and T. Dumitraş, "FeatureSmith:Automatically Engineering Features for Malware Detection by Mining the Security Literature," presented at the Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016. [14] G. Husari, X. Niu, B. Chu, and E. Al-Shaer, "Using entropy and mutual information to extract threat actions from cyber threat intelligence," in 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), 2018: IEEE, pp. 1-6. [15] ExecuRead. "Speed Reading Facts." https://secure.execuread.com/facts/ (accessed: July. 3, 2020). [16] "MITRE ATT&CK." https://attack.mitre.org/ (accessed: Aug. 1, 2020). [17] "CAPEC-Common Attack Pattern Enumeration and Classification." https://capec.mitre.org/ (accessed: July. 1, 2020). [18] E. M. Hutchins, M. J. Cloppert, and R. M. Amin, "Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains," Leading Issues in Information Warfare & Security Research, vol. 1, no. 1, p. 80, 2011. [19] 羅正漢. "用MITRE ATT&CK框架識別攻擊鏈，讓入侵手法描述有一致標準." https://www.ithome.com.tw/news/129054 (accessed: May , 1, 2020). [20] L. Spitzner. "Applying Security Awareness to the Cyber Kill Chain." https://www.sans.org/security-awareness-training/blog/applying-security-awareness-cyber-kill-chain (accessed: May, 1, 2020). [21] T. Degonia. "Explaining the Cyber Kill Chain Model." https://cybersecurity.att.com/blogs/security-essentials/the-internal-cyber-kill-chain-model (accessed: May, 1, 2020). [22] "MITRE." https://www.mitre.org/ (accessed: Jun , 1, 2020). [23] 羅正漢. "【不只幫助攻擊入侵行為的理解，更便於企業防禦評估】資安攻防新戰略MITRE ATT&CK." https://www.ithome.com.tw/news/131274 (accessed: July. 2, 2020). [24] R. Brown and R. M. Lee, "The Evolution of Cyber Threat Intelligence (CTI): 2019 SANS CTI Survey," 2019. [25] G. Settanni, Y. Shovgenya, F. Skopik, R. Graf, M. Wurzenberger, and R. Fiedler, "Acquiring cyber threat intelligence through security information correlation," in 2017 3rd IEEE International Conference on Cybernetics (CYBCONF), 2017: IEEE, pp. 1-7. [26] "AZSecure-data.org." https://www.azsecure-data.org/ (accessed: May. 1, 2020). [27] P. Chen, L. Desmet, and C. Huygens, "A study on advanced persistent threats," in IFIP International Conference on Communications and Multimedia Security, 2014: Springer, pp. 63-72. [28] M. Li, W. Huang, Y. Wang, W. Fan, and J. Li, "The study of APT attack stage model," in 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), 2016: IEEE, pp. 1-5. [29] A. Alshamrani, S. Myneni, A. Chowdhary, D. J. I. C. S. Huang, and Tutorials, "A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities," vol. 21, no. 2, pp. 1851-1877, 2019. [30] J. Tang, M. Xu, S. Fu, K. J. T. S. Huang, and Technology, "A scheduling optimization technique based on reuse in spark to defend against apt attack," vol. 23, no. 5, pp. 550-560, 2018. [31] S. Chandel, M. Yan, S. Chen, H. Jiang, and T.-Y. Ni, "Threat Intelligence Sharing Community: A Countermeasure Against Advanced Persistent Threat," in 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019: IEEE, pp. 353-359. [32] M. Mete, N. Yuruk, X. Xu, and D. Berleant, "Knowledge discovery in textual databases: A concept-association mining approach," in Data Engineering: Springer, 2009, pp. 225-243. [33] K. S. J. J. o. d. Jones, "A statistical interpretation of term specificity and its application in retrieval," 1972. [34] S. Chandel, J. Wei, and B.-T. Chu, "A natural language processing based trend analysis of advanced persistent threat techniques," in 2018 IEEE International Conference on Big Data (Big Data), 2018: IEEE, pp. 2995-3000. [35] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3111-3119. [36] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013. [37] W. Tian, J. Li, and H. Li, "A method of feature selection based on Word2Vec in text categorization," in 2018 37th Chinese Control Conference (CCC), 2018: IEEE, pp. 9452-9455. [38] K. Orkphol and W. J. F. I. Yang, "Word sense disambiguation using cosine similarity collaborates with Word2vec and WordNet," vol. 11, no. 5, p. 114, 2019. [39] A. Handler, "An empirical study of semantic similarity in WordNet and Word2Vec," 2014. [40] Q. Le and T. Mikolov, "Distributed representations of sentences and documents," in International conference on machine learning, 2014, pp. 1188-1196. [41] M. Kadoguchi, S. Hayashi, M. Hashimoto, and A. Otsuka, "Exploring the Dark Web for Cyber Threat Intelligence using Machine Leaning," in 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), 2019: IEEE, pp. 200-202. [42] L. T. B. Ranera, G. A. Solano, and N. Oco, "Retrieval of Semantically Similar Philippine Supreme Court Case Decisions using Doc2Vec," in 2019 International Symposium on Multimedia and Communication Technology (ISMAC), 2019: IEEE, pp. 1-6. [43] J. H. Lau and T. J. a. p. a. Baldwin, "An empirical evaluation of doc2vec with practical insights into document embedding generation," 2016. [44] "gensim." https://radimrehurek.com/gensim/index.html (accessed: May. 1, 2020). [45] Wikipedia. "fastText." https://en.wikipedia.org/wiki/FastText (accessed: July. 1, 2020). [46] P. Bojanowski, E. Grave, A. Joulin, and T. J. T. o. t. A. f. C. L. Mikolov, "Enriching word vectors with subword information," vol. 5, pp. 135-146, 2017. [47] V. Zolotov and D. J. a. p. a. Kung, "Analysis and optimization of fasttext linear text classifier," 2017. [48] I. Santos, N. Nedjah, and L. de Macedo Mourelle, "Sentiment analysis using convolutional neural network with fastText embeddings," in 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2017: IEEE, pp. 1-5. [49] B. Athiwaratkun, A. G. Wilson, and A. J. a. p. a. Anandkumar, "Probabilistic fasttext for multi-sense word embeddings," 2018. [50] H. Y. Erdinҫ and A. Güran, "Semi-supervised Turkish Text Categorization with Word2Vec, Doc2Vec and FastText Algorithms," in 2019 27th Signal Processing and Communications Applications Conference (SIU), 2019: IEEE, pp. 1-4. [51] D. Gromann and T. Declerck, "Comparing pretrained multilingual word embeddings on an ontology alignment task," in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018. [52] K. S. Jones and S. Robertson. "Okapi BM25." https://en.wikipedia.org/wiki/Okapi_BM25 (accessed: July. 2, 2020). [53] C.-C. Hsu and S.-H. Wu, "以籠統查詢評估查詢擴展方法與線上搜尋引擎之資訊檢索效能 (Evaluating the Information Retrieval Performance of Query Expansion Method and On-line Search Engine on General Query)[In Chinese]," in International Journal of Computational Linguistics & Chinese Language Processing, Volume 16, Number 1-2, March/June 2011, 2011. [54] T. S. N. L. P. Group. "Standford NLP." https://nlp.stanford.edu/software/lex-parser.shtml (accessed: July. 18, 2020). [55] "APT Notes." https://github.com/kbandla/APTnotes (accessed: May: 1, 2020). [56] "pdfminer." https://pypi.org/project/pdfminer/ (accessed: May. 3, 2020). [57] "Natural Language Toolkit." https://www.nltk.org/ (accessed: May. 1, 2020). [58] N. Chomsky, "Three models for the description of language," IRE Transactions on information theory, vol. 2, no. 3, pp. 113-124, 1956. [59] "WordNet." https://wordnet.princeton.edu/ (accessed: May . 1, 2020). [60] Google. "Google Translate." https://translate.google.com.tw/ (accessed 1: Mar. 1 2020). [61] B. E. Strom et al., "Finding Cyber Threats with ATT&CK-Based Analytics, sl: The MITRE Corporation," Tech. Rep., 1 (1), 2017. [62] "Threat Report ATT&CK Mapping." https://github.com/mitre-attack/tram (accessed: June. 3, 2020). [63] V. Legoy, M. Caselli, C. Seifert, and A. Peter, "Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports," arXiv preprint arXiv:2004.14322, 2020.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2025-11-22 校外 Off-campus：開放下載的時間 available 2025-11-22 您的 IP(校外) 位址是 18.218.172.249 現在時間是 2024-04-27 論文校外開放下載的時間是 2025-11-22 Your IP address is 18.218.172.249 The current date is 2024-04-27 This thesis will be available to you on 2025-11-22.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2025-11-22

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS