Responsive image
博碩士論文 etd-0914112-155523 詳細資訊
Title page for etd-0914112-155523
論文名稱
Title
基於結構相似度之原始碼分類研究
Code Classification Based on Structure Similarity
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-07-26
繳交日期
Date of Submission
2012-09-14
關鍵字
Keywords
結構相似度、靜態分析、原始碼、惡意軟體分類
Malware Classification, Source Code, Static Analysis, Structure Similarity
統計
Statistics
本論文已被瀏覽 5828 次,被下載 382
The thesis/dissertation has been browsed 5828 times, has been downloaded 382 times.
中文摘要
面對日益複雜的惡意軟體與其變形,自動化惡意軟體分類為數位鑑識中最重要的一環。正確的惡意軟體分類可以得到惡意軟體最完整的系統行為,並且簡化鑑識之分析工作。傳統的惡意軟體分類著重於執行後之動態分析或者是以逆向工程結合靜態分析的方式,試圖取得惡意軟體的系統行為資訊,但惡意軟體會透過反虛擬機器監控和混淆技術來降低分類的正確率。
隨著誘捕系統愈來愈健全,誘捕系統所蒐集到的惡意軟體原始碼也日漸增加,藉由分析惡意軟體的原始碼可以得到最正確的惡意軟體分類,因此本論文提出一個自動化惡意軟體分類機制。本研究藉由誘捕系統所擷取之惡意軟體原始碼,利用惡意軟體檔案結構相似度以及原始碼檔案相似度,透過階層式分群演算法(Hierarchical Clustering Algorithmn)之方法,不但可以正確的將新捕捉到的惡意軟體分類到正確的類別,也可以快速地找出新類型的惡意軟體。本論文提出的方式可以大幅度減少數位鑑識者針對同一類型的惡意軟體重複進行高成本的分析,亦可在最短時間內了解攻擊者行為以及意圖。本研究透過實驗證明,系統除了可以將惡意軟體原始碼做正確的分類外,亦可應用於其他有原始碼分類需求的領域。
Abstract
Automatically classifying malware variants source code is the most important research issue in the field of digital forensics. By means of malware classification, we can get complete behavior of malware which can simplify the forensics task. In previous researches, researchers use malware binary to perform dynamic analysis or static analysis after reverse engineering. In the other hand, malware developers even use anti-VM and obfuscation techniques try to cheating malware classifiers.
With honeypots are increasingly used, researchers could get more and more malware source code. Analyzing these source codes could be the best way for malware classification. In this paper, a novel classification approach is proposed which based on logic and directory structure similarity of malwares. All collected source code will be classified correctly by hierarchical clustering algorithm. The proposed system not only helps us classify known malwares correctly but also find new type of malware. Furthermore, it avoids forensics staffs spending too much time to reanalyze known malware. And the system could also help realize attacker's behavior and purpose. The experimental results demonstrate the system can classify the malware correctly and be applied to other source code classification aspect.
目次 Table of Contents
誌謝 II
中文摘要 III
Abstract IV
目錄 V
圖次 VII
表次 IX
第一章 緒論 1
第一節 研究背景 1
第二節 研究動機 2
第三節 研究目的 3
第二章 相關文獻 4
第一節 惡意軟體分類 4
第二節 原始碼比對 7
第三節 相似度計算 7
第三章 問題定義與研究方法 11
第一節 問題定義 11
第二節 系統架構與流程 16
第三節 相似度定義 18
第四章 系統評估 24
第一節 樣本蒐集 24
第二節 實驗一:自行改寫之原始碼獨立檔案依變異階段順序輸入 25
第三節 實驗二:自行改寫之原始碼獨立檔案隨機輸入 28
第四節 實驗三:自行改寫之原始碼壓縮檔案隨機輸入 30
第五節 實驗四:誘捕系統所蒐集可疑下載 34
第五章 結論及未來展望 43
第六章 相關文獻 44
參考文獻 References
[1] Sans, "Bots & botnet: An overview," http://www.sans.org/rr/whitepapers/malicious/1299.php, 2003.
[2] COMPUTERWORLD, “Security firm warns of commercial, on-demand DDoS botnet,” http://www.computerworld.com/s/article/9185179/Security_firm_warns_of_commercial_on_demand_DDoS_botnet, 2010.
[3] B. Stone-Gross, T. Holz, G. Stringhini, and G. Vigna, “The Underground Economy of Spam: a Botmaster’s Perspective of Coordinating Large-Scale Spam Campaigns,” In Proceedings of the 4th USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET), Apr. 2011.
[4] HELP NET SECURITY, “Microsoft cripples the Waledac botnet,” http://www.net-security.org/secworld.php?id=8926, 2010.
[5] HELP NET SECURITY, “Rustock botnet downed by Microsoft,” http://www.net-security.org/secworld.php?id=10764, 2011.
[6] HELP NET SECURITY, “Microsoft offers $250,000 reward for botnet information,” http://www.net-security.org/secworld.php?id=11299, 2011.
[7] C. Willems, T. Holz, and F. Freiling, “Toward Automated Dynamic Malware Analysis Using CWSandbox,” IEEE Security and Privacy, no. 2, vol. 5, Mar./Apr. 2007, pp. 32-39.
[8] M. Harman, “Why Source Code Analysis and Manipulation Will Always Be Important,” in Proceedings of the 10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010), Timişoara, Romania, Sep. 12-13, 2010.
[9] J. Z. Kolter, and M. A. Maloof, “Learning to Detect and Classify Malicious Executables in the Wild,” The Journal of Machine Learning Research, vol. 7, 2006, pp. 2721-2744.
[10] G. Tahan, L. Rokach, and Y. Shahar, “Mal-ID:Automatic Malware Detection Using Common Segment Analysis and Meta-Features,” Journal of Machine Learning Research, vol. 13, 2012, pp. 949-979.
[11] M.G. Schultz, E. Eskin, F. Zadok, and S.J. Stolfo, “Data mining methods for detection of new malicious executables,” The 2001 IEEE Symposium on Security and Privacy, Oakland, CA, May 2001.
[12] T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, “N-gram-based detection of new malicious code,” in Proceedings of the 28th Annual International Computer Software and Applications Conference, IEEE CSP, 2004.
[13] J.Z. Kolter and M.A. Maloof, “Learning to detect and classify malicious executables in the wild,” The Journal of Machine Learning Research, vol. 7, Dec 2006, pp. 2721-2744.O. Henchiri and N. Japkowicz, “A feature selection and evaluation scheme for computer virus detection,” in Proceedings of the Sixth International Conference on Data Mining, Hong Kong, 2006, pp. 891-895.
[14] O. Henchiri and N. Japkowicz, “A feature selection and evaluation scheme for computer virus detection,” in Proceedings of ICDM-2006, Hong Kong, 2006, pp. 891–895.
[15] B. Zhang, J. Yin, J. Hao, D. Zhang, and S. Wang, “Malicious codes detection based on ensemble learning,” in Proceedings of The 4th International Conference on Autonomic and Trusted Computing, vol. 4610, 2007, pp. 468-477.
[16] Y. Elovici, A. Shabtai, R. Moskovitch, G. Tahan, and C. Glezer, “Applying machine learning techniques for detection of malicious code in network traffic,” in Proceedings of the 30th annual German conference on Advances in Artificial Intelligence, Berlin, Germany, Sep. 10-13, 2007, pp. 44-50.
[17] J. Jang, D. Brumley, and S. Venkataraman, “BitShred: Feature Hashing Malware for Scalable Triage and Semantic Analysis,” in Proceedings of the 18th ACM conference on Computer and Communications Security, Chicago, Illinois, Oct. 17-21, 2011, pp. 309–320.
[18] Y. Ye, D. Wang, T. Li, D. Ye, and Q. Jiang, “An intelligent pe-malware detection system based on association mining,” Journal in Computer Virology, vol. 4, no. 4, 2008, pp.323–334.
[19] Y. Ye, L. Chen, D. Wang, T. Li, Q. Jiang, and M. Zhao, “Sbmds: an interpretable string based malware detection system using svm ensemble with bagging,” Journal in Computer Virology, vol. 5, no. 4, 2009, pp. 283–293.
[20] Y. Ye, T. Li, K. Huang, Q. Jiang, and Y. Chen, “Hierarchical associative classifier (hac) for malware detection from the large and imbalanced gray list,” Journal of Intelligent Information Systems, vol. 35, no. 1, 2010, pp. 1–20.
[21] A. Altaher, Supriyanto, A. ALmomani, M. Anbar, and S. Ramadass, “Malware detection based on evolving clustering method for classification,” Scientific Research and Essays, vol. 7, no. 22, Jun 14, 2012, pp.2031-2036.
[22] M. Gheorghescu, "An automated virus classification system," in Virus Bulletin Conference, 2005, pp. 294-300.
[23] M. Christodorescu, and S. Jha, “Static Analysis of Executables to Detect Malicious Patterns,” in Proceedings of the 12th USENIX Security Symposium, 2003.
[24] S. Cesare, and Y. Xiang, “Classification of Malware Using Structured Control Flow,” in Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010), 2010.
[25] K. Zen, D.N.F.A. Iskandar, and O. Linang, “Using Latent Semantic Analysis for Automated Grading Programming Assignments,” in Proceedings of Semantic Technology and Information Retrieval (STAIR), Putrajaya, Malaysia, Jun 28-29, 2011, pp. 82-88.
[26] J.I. Maletic, and N. Valluri, “Automatic software clustering via Latent Semantic Analysis,” in Proceedings of 14th IEEE International Conference on Automated Software Engineering (ASE’99), Cocoa Beach Florida, Oct 1999, pp. 251-254.
[27] D. Zhang, J. Wang, D. Cai, and J. Lu, “Self-taught hashing for fast similarity search,” in Proceedings of Proceedings of the Annual International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR), 2010.
[28] Edit distance - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Edit_distance
[29] Graphviz - Graph Visualization Software, http://www.graphviz.org/.
[30] Meld Diff Viewer – Compare and Merge files/directories in Ubuntu, http://ubuntuguide.net/meld-diff-viewer-compare-and-merge-filesdirectories-in-ubuntu
[31] virustotal - Free Online Virus, Malware and URL Scanner, https://www.virustotal.com/
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code