Abstract |
Many studies employ machine learning to detect botnet C&C communications traffic quite effective. If the former data handled properly, it will affect the final detection performance. So that is must be complete data preprocessing to facilitate operation analysis program. The Botnet traffic based detection research lack of general guidance data conversion. This study presents four coding rules and chose the Rough Set, Support Vector Machine and Naïve Bayes as experimental classifier. Initial experiments used the Rough Set and Las Vegas Filter as a feature selection algorithm discussed when the feature selection, the best data coding rules. Based on the results of the initial experiments conducted subsequent experiments were compared using feature selection on detection performance, the final experiments are compared using feature selection on detection performance by analyzing experimental data concluded that data coding rules and design guidelines. The study has two important findings. Firstly, carefully distinguishing Empty, NULL, and the meanings of data can avoid confusing situations of data coding and negative detection result of the system. Secondly, the minor difference of the data contents should be ignored to find a stronger correlation among the similar events when machine learning detection models are adopted. Hence, the Rough Set to verify the effective conduct of feature selection, helps eliminate redundant data, Acceleration analysis time and improves detection accuracy. |