C&C 1,2, 1 1,2 2,3,a) 1,2 2014 12 8, 2015 6 5 Command and Control C&C 1 C&C C&C C&C C&C C&C C&C C&C C&C Evaluation of Machine Learning Techniques for C&C Traffic Classification Kazumasa Yamauchi 1,2, 1 Junpei Kawamoto 1,2 Yoshiaki Hori 2,3,a) Kouichi Sakurai 1,2 Received: December 8, 2014, Accepted: June 5, 2015 Abstract: With the spread of Internet, the number of damage from botnet is increasing. General botnet use Command and Control (C&C) server and detecting C&C server is one of the technique of botnet measures. However, it is hard to detect C&C server because of diversification of C&C protocol and changing of botnet configuration. In our work, we define a feature vector to detect C&C server and report the experiment result that is classification normal traffic and C&C session by using real network traffic. Finally we show the effectiveness as the method of detecting C&C server which use several kinds of protocols. Keywords: botnet, C&C server, anomaly detection, machine learning 1. Command & Control C&C 1 Kyushu University, Fukuoka 819 0935, Japan 2 Institute of Institute of Systems, Information Technologies and Nanotechnologies (ISIT), Fukuoka 814 0001, Japan 3 Saga University, Saga 840 8502, Japan 1 Presently with NIPPON TELEGRAPH AND TELE- PHONE WEST CORPORATION a) horiyo@cc.saga-u.ac.jp [1] C&C C&C C&C C&C DDoS C&C C&C 1 C&C 1.1 C&C IRC HTTP P2P c 2015 Information Processing Society of Japan 1745
IRC IRC HTTP P2P HTTP P2P IRC 1993 [1] IRC C&C [2], [3] [2] IRC C&C 16 3 [3] IRC IRC / n-gram 2003 C&C P2P [1] P2P P2P PeerShark [4] [4] IRC P2P HTTP HTTP 2005 [1] HTTP IRC HTTP HTTP HTTP HTTP [5], [6] HTTP GET POST HTTP [5] [6] Artificial Immune System AIS [10] AIS HTTP 1.2 [3], [4], [5] 1 [5], [6] [3] IRC 2 C&C IRC HTTP C&C C&C C&C C&C C&C C&C C&C 2 3 C&C 4 5 6 7 2. C&C 1 1 CCCDATASet 10 1 1 C&C 3 1 4 3 c 2015 Information Processing Society of Japan 1746
1 Table 1 Feature vector. V 1 V 2 V 3 V 4 V 5 V 6 V 7 PKT Byte PKT Byte s 1 Fig. 1 Time-chart of Botnet activity. 3 PC C&C 3 19 C&C 3 14 7 1 C&C C&C C&C 3. C&C C&C CCCDataSet 09C09 CCCDataSetC10 PRACTICE 13P13 [7] IRC HTTP C&C 3.1 C&C 2 1 C&C IRC IRC 1.1 2 C&C HTTP P2P HTTP IRC HTTP HTTP IRC 3.2 C&C [8] 36 [8] C&C IRC C&C C&C [5] HTTP C&C HTTP DNS P2P [2] IRC TCP 1 / TCP 1 V 6 V 7 V 1 V 2 C&C IRC HTTP 1 V 3 V 4 C&C c 2015 Information Processing Society of Japan 1747
情報処理学会論文誌 Vol.56 No.9 1745 1753 (Sep. 2015) 図 2 C&C セッション分析 全結果 Fig. 2 C&C session analysis (All). 図 3 V1 V3 (IRC) Fig. 3 V1 V3 (IRC). 図 4 V1 V3 (HT T P ) Fig. 4 V1 V3 (HT T P ). 図 5 V6 V7 (HT T P ) Fig. 5 V6 V7 (HT T P ). データサイズの総数に関しては パケットのヘッダ情報を つとして考えることができ C&C セッションを検出する 基にセッションごとに含まれているパケットのデータサイ ことでボットネットによる攻撃を未然に防ぐことを可能に ズを合計したものである V5 はパケットのヘッダ情報に含 する 図 2 は IRC と HTTP の通信に関してそれぞれ特徴 まれるタイムスタンプを確認し セッション終了時刻から ベクトルを用いて解析を行った結果を示しており 通常の セッション開始時刻の差をとった時間である また V6 は HTTP または IRC セッションは青で C&C セッションは セッション中にクライアントがサーバへアクセスする回数 赤で示している また IRC に関しては セッション中に の合計を指し V7 はアクセス時間のばらつきを表している C&C サーバへの再接続を行わないので V6 V7 に関しては 考慮しない 図 2 から IRC の方がデータの分布範囲が狭 3.3 C&C セッション分析 本節では通常のセッションと C&C セッションが提案す る特徴ベクトルで分類可能であるか分析を行う 2 章より C&C サーバの通信はボットネットが攻撃を行う予兆の 1 c 2015 Information Processing Society of Japan いことが分かる これに対し HTTP ではデータの分布範 囲が広く IRC よりも通信の多様性が見られる 図 2 の結果において 特に 2 種類のデータを区別できた 結果に関して抜粋したものを図 3 図 4 図 5 に示す 図 3 1748
2 IP Table 2 Number of unique IP address. Normal C&C C09 C10 P13 IRC 736 6 19 0 HTTP 763 51 139 15 1,499 57 158 15 3 Table 3 Number of extracted session data. 6 Fig. 6 Experiment flow. IRC 25 500 1 1 C&C 1 1 4 HTTP C&C 5 10,000 HTTP 1 1 C&C 1 1 5 HTTP C&C 4. C&C 6 4.1 Linux tcpdump Normal C&C C09 C10 P13 IRC 903 190 573 0 HTTP 1,270 84 255 406 2,173 274 828 406 4.1.1 2012 8 9 6667 IRC 80 HTTP 4.1.2 C&C C09 C10 P13 C&C IRC JOIN HTTP GET C&C JOIN GET 2 3 IP P13 IRC 4.2 1 C&C TCP TCP c 2015 Information Processing Society of Japan 1749
20 TCP TCP 4 5 IRC HTTP 4 C&C IRC V 1 V 5 C10 V 1 V 5 C09 IRC V 6 =1 V 7 =0 5 C&C HTTP V 6 P13 V 1 V 7 C09 C10 V 4 C&C HTTP IRC i j ˆx i,j =(x i,j min(x n,j ))/ max(x m,j ) j n x n,j m x m,j i j 4 IRC Table 4 IRC session data analysis: Average (variance). Normal (IRC) C&C C09 C10 V 1 88 (6.0 10 3 ) 6 (24) 5 (250) V 2 1,187 (3.6 10 6 ) 67 (1.5 10 4 ) 77 (1.9 10 4 ) V 3 75 (6.1 10 3 ) 2 (6.9) 3 (632) V 4 1,336 (2.2 10 6 ) 177 (1.7 10 5 ) 185 (1.6 10 6 ) V 5 583 (2.8 10 5 ) 8 (75) 6 (111) V 6 1(0) 1(0) 1(0) V 7 0(0) 0(0) 0(0) x i,j ˆx i,j 0 1 4.3 3 SVM [2] IRC C&C SVM IRC HTTP 3.3 C&C HTTP IRC R [13] SVM kernlab [14] glmnet [15] e1071 [16] SVM k( x, y) =exp x y 2 2σ 2 σ SVM 3 5 HTTP Table 5 HTTP session data analysis: Average (variance). Normal (HTTP) C&C C09 C10 P13 V 1 88 (1.5 10 7 ) 60 (1.4 10 2 ) 47 (1.3 10 4 ) 4 (5.7) V 2 33,140 (3.9 10 9 ) 194 (5.7 10 2 ) 177 (2.1 10 9 ) 126 (1.4 10 2 ) V 3 129 (1.7 10 6 ) 50 (900) 35.4 (1.4 10 7 ) 3.4 (74) V 4 33,671 (2.1 10 12 ) 66,320 (1.9 10 9 ) 42,212 (2.6 10 9 ) 1,135 (1.1 10 4 ) V 5 249 (1.2 10 5 ) 2.6 (2.8) 0.27 (1.3 10 4 ) 1.7 (3.7) V 6 9.15 (1.3 10 6 ) 3.8 (0.13) 35.7 (7.8 10 5 ) 1.1 (0.3) V 7 122 (1.3 10 5 ) 0.64 (2.3) 3.1 (6.67) 1.5 (0.5) c 2015 Information Processing Society of Japan 1750
2/3 1/3 5. 7 7 HTTP IRC HTTP [2] SVM 22.3% 7 LR 8.2% 7 NB 3.9% SVM 2.7% 14.9% 23.2% V 6 V 7 C&C C&C Web Ajax 1 Web IRC IRC V 6 =1 V 7 =0 V 1 V 5 SVM 17% / 6. 5 HTTP IRC SVM 6 7 LR 6 7 NB 2 6.1 6 P13 C&C P13 HTTP P13 6 Table 6 Result of classifying every DataSet. Normal C&C IRC HTTP C09 C10 P13 IRC HTTP IRC HTTP HTTP SVM (Normal) 219 414 1 0 9 1 1 SVM (Anomaly) 45 12 66 32 208 100 103 LR (Normal) 196 251 0 2 1 14 0 LR (Anomaly) 68 115 67 30 216 87 104 NB (Normal) 197 382 0 8 0 17 0 NB (Anomaly) 67 44 67 27 217 84 104 264 426 67 32 217 101 104 (SVM) [%] 98.5 100 95,9 99.0 99.0 (SVM) [%] 17.0 2.8 (SVM) [%] 1.5 0 4.1 1.0 1.0 (LR) [%] 100 93.7 99.5 86.1 100 (LR) [%] 25.8 27.0 (LR) [%] 0 6.3 0.5 13.9 0 (NB) [%] 100 84.4 100 83.2 100 (NB) [%] 25.4 10.3 (NB) [%] 0 15.6 0 16.8 0 Table 7 7 Comparison of execution time of machine learning algorithms. 7 Fig. 7 Comparison between proposed vector and existing vector. SVM LR NB (s) 1.97 2.01 0.03 (s) 0.12 0.81 0.38 (s) 2.09 2.82 0.41 c 2015 Information Processing Society of Japan 1751
SVM C&C IRC 17.0% C10 HTTP 13.9%IRC HTTP 25.8% 27.0% C09 HTTP 25% C10 HTTP 16.8% IRC 25.4% 6.2 7 A SVM SVM 6.3 6.1 6.2 C&C C&C C&C SVM 90% 4% 7. C&C C&C C&C SVM SVM DNS P2P [1] Vania, J., Meniya, A. and Jethva, H.B.: A Review on Botnet and Detection Technique, International Journal of Computer Trends and Technology, Vol.4, No.1, pp.23 29 (2013). [2] Kondo, S. and Sato, N.: Botnet Traffic Detection Techniques by C&C Session Classification Using SVM, Proc. 2nd International Workshop on Security (IWSEC 2007 ), pp.91 104 (2007). [3] Goebel, J. and Holz, T.: Rishi: Identify bot contaminated hosts by IRC nickname evaluation, Proc. 1st USENIX HotBots (2007). [4] Narang, P., Ray, S., Hota, C. and Venkatakrishnan, V.: PeerShark-Detecting Peer-to-Peer Botnets by Tracking Conversations, Proc. IEEE Security & Privacy Workshops (SPW 2014 ), pp.108 115 (2014). [5] Ashley, D.: An Algorithm for HTTP Bot Detection, Research paper, University of Texas - Information Security Office (2011). [6] Tyagi, A.K. and Nayeem, S.: Detecting HTTP Botnet using Artificial Immune System, International Journal of Applied Information Systems, Vol.2, No.6, pp.34 37 (2012). [7] 2014 MWS2014 http://www.iwsec.org/mws/2014/ about.html 2014-12-05. [8] AdaBoost Vol.53, No.9, pp.2062 2074 (2012). [9] Gu, G., Perdisci, R., Zhang, J. and Lee, W.: BotSniffer: Detecting botnet command and control channels in network traffic, Proc. 15th Annual Network and Distributed System Security Symposium (NDSS 2008 ) (2008). [10] Castro, L.N. and Timmis, J.: Artificial Immune Systems, A New Computational Intelligence Approach, Springer (2002). [11] Schehlmann, L. and Baier, H.: COFFEE: A Concept based on OpenFlow to Filter and Erase Events of Botnet activity at high-speed nodes, Proc. INFORMATIK 2013, pp.2225 2239 (2013). [12] Gu, G., Perdisci, R., Zhang, J. and Lee, W.: BotMiner: Clustering Analysis of Network Traffic for Protocoland Structure-Independent Botnet Detection, Proc. 17th USENIX Security Symposium (2008). c 2015 Information Processing Society of Japan 1752
[13] R project, available from http://www.r-project.org/ (accessed 2014-11-10). [14] Package kernlab, available from http://cran.r-project. org/web/packages/kernlab/kernlab.pdf (accessed 2014-11-10). [15] Package glmnet, available from http://cran.r-project. org/web/packages/glmnet/glmnet.pdf (accessed 2014-11-10). [16] Package e1071, available from http://cran.r-project. org/web/packages/e1071/e1071.pdf (accessed 2014-11- 10). 1988 2004 2 2000 2000 2004 2005 IPA ACM IEEE 2013 2015 2007 2012 2013 IEEE ACM 1992 1994 1994 2004 2013 2000 2 ACM IEEE c 2015 Information Processing Society of Japan 1753