國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於機器學習的偵測變形惡意軟體,Detecting Metamorphic Malware based on Machine Learning

論文名稱 Title	基於機器學習的偵測變形惡意軟體 Detecting Metamorphic Malware based on Machine Learning
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	107 學年度第 2 學期 The spring semester of Academic Year 107	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	68
研究生 Author	戴辰翰 Chen-Han Dai
指導教授 Advisor	陳嘉玫 Chia-Mei Chen
召集委員 Convenor	楊竹星 Chu-Sing Yang
口試委員 Advisory Committee	賴谷鑫, 劉譯閎, 林輝堂 Gu-Hsin Lai; Yi-Hung Liu; Hui-Tang Lin
口試日期 Date of Exam	2019-07-24	繳交日期 Date of Submission	2019-08-17
關鍵字 Keywords	PE標頭、機器學習、靜態偵測、變形惡意軟體 metamorphic malware, static detection, PE headers, machine learning
統計 Statistics	本論文已被瀏覽 5933 次，被下載 0 次 The thesis/dissertation has been browsed 5933 times, has been downloaded 0 times.

中文摘要
隨著網際網路的盛行，Windows平台上的惡意軟體日益漸增，根據McAfee Labs的分析報告顯示，目前惡意軟體使用規避偵測手法的案例也逐漸增加，各種不同的規避手法，包括混淆化（Obfuscation）、加殼等手法都會影響到防毒軟體或偵測系統的準確度。惡意軟體透過混淆化得以抹除自身的特徵，又因混淆化程度不同，可以分為寡型惡意軟體、多型惡意軟體、與變形惡意軟體，其中變形惡意軟體的混淆化程度最高，會使用多種混淆化手法，如Junk Code Insertion、Register Reassignment等手法，進一步提高規避偵測的機率，這使得資安人員需耗費更多時間進行分析，分析也很大程度仰賴資安人員的經驗，因此一套有效快速的變形惡意軟體偵測系統是有必要的。本研究統整先前的研究方法，提出一個自動化的變形惡意軟體偵測系統，分別使用PE檔案的標頭與操作碼作為特徵進行靜態偵測，以多種機器學習演算法分別訓練出兩個模型，透過兩階段的偵測改善誤判率，並與其他文獻的偵測方式比較，證實本研究的系統可達到高偵測、低誤判的偵測。
Abstract
With the prevalence of the Internet, the number of malware in the Windows platform is growing. According to the McAfee Labs’ analysis report, the cases of malware using evasive techniques has also increased. Many kinds of evasive techniques, including obfuscation and packing, affect the detection accuracy for the anti-virus and other detection systems. Malware can wipe out its own signatures with the help of obfuscation. Due to the different level of obfuscation, obfuscated malware can be categorized into oligomorphic, polymorphic and metamorphic malware. Among all, the level of obfuscation for metamorphic malware is the highest, and it combines multiple obfuscation techniques, like Junk Code Insertion and Register Reassignment, to evade detections. This requires security analysts to consume more time to analyze these samples, and malware analysis also heavily relies on the experiences from the analysts themselves. Thus, a fast and effective system for detecting metamorphic malware is necessary. This study summarizes all of previous works and proposes an automatic detection system for metamorphic malware. It uses PE headers and opcodes as features to perform static detection, and trains respectively 2 models with multiple machine learning algorithms. With the 2-phase detection models, it improves in false positive rate. Besides, the proposed method is compared with other common ones, and it shows a high detection, low false positive rate.

目次 Table of Contents
論文審定書 i 論文公開授權書 ii 摘要 iii Abstract iv 第一章緒論 1 1.1. 研究背景 1 1.2. 研究動機 4 第二章文獻探討 6 2.1. 混淆化惡意軟體分類 6 2.2. 變形惡意軟體混淆化手法 7 2.3. 變形惡意軟體偵測 11 2.3.1. 分類器 11 2.3.2. 相似度（Similarity） 17 2.4. 惡意軟體偵測 20 2.4.1. CNN（Convolutional Neural Network） 20 2.5. PE格式 21 第三章研究方法 23 3.1. PE解析模組 25 3.2. 惡意軟體偵測模組 25 3.2.1. PE標頭特徵 25 3.2.2. PE標頭特徵預處理 27 3.2.3. 惡意軟體偵測模型 32 3.3. 變形惡意軟體偵測模組 32 3.3.1. PE操作碼特徵 33 3.3.2. TF-IDF 33 3.3.3. 變形惡意軟體偵測模型 33 第四章系統評估 35 4.1. 實驗一：本研究系統驗證 36 4.1.1. 實驗環境 36 4.1.2. 樣本來源 37 4.1.3. PE解析模組 38 4.1.4. 惡意軟體偵測模組實驗結果 39 4.1.5. 變形惡意軟體偵測模組實驗結果 40 4.1.6. 整合系統 41 4.2. 實驗二：VirusTotal 42 4.3. 實驗三：N-gram 42 4.4. 實驗四：隱馬可夫模型 44 4.5. 實驗五：操作碼流程圖 47 4.6. 實驗六：CNN 49 4.7. 小結 51 第五章研究貢獻與未來展望 52 參考資料 53

參考文獻 References
[1] A. Dominguez, "Creeper and Reaper, the First Virus and First Antivirus in History," Pandora FMS, 10 10 2018. [Online]. Available: https://pandorafms.com/blog/creeper-and-reaper/. [Accessed 14 6 2019]. [2] T. L. 趨勢科技全球技術支援與研發中心, "《電腦病毒30演變史》1988-2018 年電腦病毒/資安威脅演變史," 資安趨勢部落格, 6 11 2018. [Online]. Available: https://blog.trendmicro.com.tw/?p=57624. [Accessed 14 6 2019]. [3] "Malware Statistics & Trends Report," AV-Test, 13 6 2019. [Online]. Available: https://www.av-test.org/en/statistics/malware/. [Accessed 14 6 2019]. [4] U. Pro, "2017 年作業系統市占：Windows 微跌仍居榜首，macOS 年增 15%," TechNews, 5 1 2018. [Online]. Available: http://technews.tw/2018/01/05/windows-market-share-drop-2017/. [Accessed 27 10 2018]. [5] M. Ochsenmeier, "pestudio," [Online]. Available: https://www.winitor.com/. [Accessed 5 7 2019]. [6] "IDA: About," Hex-Rays SA, [Online]. Available: https://www.hex-rays.com/products/ida/index.shtml. [Accessed 5 7 2019]. [7] "Cuckoo Sandbox - Automated Malware Analysis," Stichting Cuckoo Foundation, [Online]. Available: https://cuckoosandbox.org/. [Accessed 5 7 2019]. [8] "LordNoteworthy/al-khaser," GitHub, [Online]. Available: https://github.com/LordNoteworthy/al-khaser. [Accessed 27 10 2018]. [9] Y. Gao, Z. Lu and Y. Luo, "Survey on malware anti-analysis," in Fifth International Conference on Intelligent Control and Information Processing, Dalian, China, 2014. [10] P. STAFF, "New modular downloaders fingerprint systems - Part 2: AdvisorsBot," Proofpoint, 23 8 2018. [Online]. Available: https://www.proofpoint.com/us/threat-insight/post/new-modular-downloaders-fingerprint-systems-part-2-advisorsbot. [Accessed 27 10 2018]. [11] "McAfee Labs Quarterly Threat Report June 2017," McAfee Labs, 2017. [12] B. N, "Hackers Distributing FELIXROOT Backdoor Malware using Microsoft Office Vulnerabilities," GBHackers on Security, 27 7 2018. [Online]. Available: https://gbhackers.com/felixroot-backdoor/. [Accessed 27 10 2018]. [13] I. You and K. Yim, "Malware Obfuscation Techniques: A Brief Survey," in 2010 International Conference on Broadband, Wireless Computing, Communication and Applications, Fukuoka, Japan, 2010. [14] S. Rai, "Combining Register Value Analysis with Similarity based technique for Metamorphic Malware detection," in 2014 International Conference on Signal Propagation and Computer Technology (ICSPCT 2014), Ajmer, India, 2014. [15] V. Mehra, V. Jain and D. Uppal, "DaCoMM Detection and Classification of Metamorphic Malware," in 2015 Fifth International Conference on Communication Systems and Network Technologies, Gwalior, India, 2015. [16] S. B. Prapulla, S. J. Bhat and G. Shobha, "Framework for Detecting Metamorphic Malware based on Opcode Feature Extraction," in 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), Bangalore, India, 2017. [17] D. Baysa, R. M. Low and M. Stamp, "Structural entropy and metamorphic malware," Journal of Computer Virology and Hacking Techniques, pp. 179-192, 14 4 2013. [18] P. Khodamoradi, M. Fazlali, F. Mardukhi and M. Nosrati, "Heuristic metamorphic malware detection based on statistics of assembly instructions using classification algorithms," in 2015 18th CSI International Symposium on Computer Architecture and Digital Systems (CADS), Tehran, Iran, 2015. [19] N. Runwal, R. M. Low and M. Stamp, "Opcode Graph Similarity and Metamorphic Detection," Journal in Computer Virology, pp. 37-52, 3 4 2012. [20] "中間語言," 維基百科，自由的百科全書, 11 12 2017. [Online]. Available: https://zh.wikipedia.org/wiki/%E4%B8%AD%E9%96%93%E8%AA%9E%E8%A8%80. [Accessed 4 11 2018]. [21] S. Alam, N. R. Horspool and I. Traore, "MAIL: Malware Analysis Intermediate Language - A Step Towards Automating and Optimizing Malware Detection," in ACM Sixth International Conference on Security of Information and Networks, Aksaray, Turkey, 2013. [22] S. Alam, I. Sogukpinar, I. Traore and N. R. Horspool, "Sliding window and control ﬂow weight for metamorphic malware," Journal of Computer Virology and Hacking Techniques, pp. 75-88, 8 8 2014. [23] A. Yewale and M. Singh, "Malware detection based on opcode frequency," in 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), Ramanathapuram, India, 2016. [24] M. Nar, A. G. Kakisim, N. Carkaci, M. N. Yavuz and I. Sogukpinar, "Analysis and Comparison of Opcode-based Malware Detection Approaches," in 2018 3rd International Conference on Computer Science and Engineering (UBMK), 2018. [25] C. E. Shannon, "A Mathematical Theory of Communication," The Bell System Technical Journal, pp. 379-423, 7 1948. [26] C. E. Shannon, “A Mathematical Theory of Communication,” The Bell System Technical Journal, pp. 379-423, 1948. [27] R. Lyda and J. Hamrock, "Using Entropy Analysis to Find Encrypted and Packed Malware," in IEEE Security & Privacy, 2007. [28] W. Wong and M. Stamp, "Hunting for Metamorphic Engines," Journal in Computer Virlogy, pp. 211-229, 11 11 2006. [29] M. Gharacheh, V. Derhami, S. Hashemi and S. M. H. Fard, "Proposing an HMM-based approach to detect metamorphic malware," in 2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Zahedan, Iran, 2015. [30] 林澤宇, “偵測處理程序注入惡意行為,” 2018. [31] Y. Lecun, L. Bottou, Y. Bengio 且 P. Haffner, “Gradient-based learning applied to document recognition,” 於 Proceedings of the IEEE, 1998. [32] "PE Format - Windows applications," Microsoft, 18 3 2019. [Online]. Available: https://docs.microsoft.com/en-us/windows/desktop/debug/pe-format. [Accessed 2 6 2019]. [33] M. Z. Shafiq, S. M. Tabish, F. Mirza and M. Farooq, "PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime," in International Workshop on Recent Advances in Intrusion Detection, 121-141, 2009. [34] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro and C. Nicholas, "Malware Detection by Eating a Whole EXE," in Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [35] A. Kumara, K. S. Kuppusamya and G. Aghila, "A learning model to detect maliciousness of portable executable using integrated feature set," Journal of King Saud University – Computer and Information Sciences, pp. 252-265, 31 1 2017. [36] “Standard score - Wikipedia,” [線上]. Available: https://en.wikipedia.org/wiki/Standard_score. [存取日期: 16 8 2019]. [37] D. Bilar, "Opcodes as predictor for malware," Int. J. Electronic Security and Digital Forensics, vol. 1, no. 2, p. 156, 2007. [38] "Cygwin," [Online]. Available: https://www.cygwin.com/. [Accessed 2 6 2019]. [39] "Ninite - Install or Update Multiple Apps at Once," Secure by Design Inc., [Online]. Available: https://ninite.com/. [Accessed 2 6 2019]. [40] "Malware Knowledge Base," [Online]. Available: https://owl.nchc.org.tw/. [Accessed 17 6 2019]. [41] "metame," [Online]. Available: https://github.com/a0rtega/metame. [Accessed 11 4 2018]. [42] erocarrera, "pefile," GitHub, [Online]. Available: https://github.com/erocarrera/pefile. [Accessed 7 4 2019]. [43] "radare," [Online]. Available: https://rada.re/r/. [Accessed 4 11 2018].

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2024-08-17 校外 Off-campus：開放下載的時間 available 2024-08-17 您的 IP(校外) 位址是 18.118.207.183 現在時間是 2024-07-27 論文校外開放下載的時間是 2024-08-17 Your IP address is 18.118.207.183 The current date is 2024-07-27 This thesis will be available to you on 2024-08-17.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2024-08-17

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS