16 DNS DNS (Domain Name System) IP 2 (DNS ) (Probabilistic Threat Propagation) DNS DNS 69% 1 ( ) DNS 9% 40% DNS 2,170 DNS This paper proposes a method to estimate malicious domain names from a large scale DNS query response dataset. The key idea of the work is to leverage the use of DNS graph that is a bipartite graph consisting of domain names and corresponding IP addresses. We apply a concept of Probabilistic Threat Propagation (PTP) on the graph with a set of predefined benign and malicious node to a DNS graph obtained from DNS queries at a backbone link. The performance of our proposed method (EPTP) outperformed that of an original PTP method (9% improved) and that of a traditional method using N-gram (40% improved) in an ROC analysis. We finally estimated 2,170 of new malicious domain names with EPTP. 1 Domain Name System (DNS) ( ) IP DNS IP DNS DNS Detecting Malicious Domains with Probabilistic Threat Propagation on DNS Graph. Yuta Kazato, Toshiharu Sugawara,, Waseda University. Kensuke Fukuda, /, National Institute of Informatics / Sokendai., Vol.33, No.3(2016), pp.16 28. [ ] 2015 8 20. IP IP C&C IP DNS ( )
Vol. 33 No. 3 Aug. 2016 17 benign.com IP 1 benign.net IP 2 unknown.com IP 3 malicious.com 1 DNS : (malicious), (benign), (unknown) (Probabilistic Threat Propagation; PTP) DNS IP 2 (DNS ) ( 1) ( IP ) 3 DNS PTP EPTP (Extended PTP) 9% (N-gram ) 40% 2 (1) PTP EPTP (2) DNS DNS 2 2. 1 DNS DNS 1 DNS Top Level Domain (TLD) (.jp) Second Level Domain (SLD) (, co.jp) IP ( ) gtld (.com,.net) cctld (.cn,.jp) 60 85% [8] [10] [24] 50% TLD [12] gtld.com.net [20] cctld.cn [25] 90% gtld DNS 1 [13] TLD 2. 2 2. 2. 1 DDoS( ) Exposure [7] Kopis [3] Notos [2] Exposure [7] time to live (TTL) 15 3,000
18 Kopis [3] DDoS Notos [2] [14] DNS SVM 2. 2. 2 [26] Domain Generation Algorithm (DGA) (N-gram) K-L DGA [15] Jaccard index C&C NX ( ) [27] [4] NX DGA [23] DGA 2. 2. 3 DNS [28] DNS [22] Fast-flux DNS IP Flux (CDN ) [16] [6] NX DNS failure graph [16] 3G [19] [11] [9] Manadhata [19] HTTP DNS Polonium [11] Machine File 2 Probability Threat Propagation (PTP) [9] web proxy IP URL IP PTP PTP [9] DNS DNS (DNS) 3 3. 1 DNS DNS A IP
Vol. 33 No. 3 Aug. 2016 19 2 DNS ( 1) G X E G = (X, E) IP IP IP i IP j i IP j (e ij E) 1 IP 1 IP ( CDN) 1 IP 1 IP ( ) DNS [5] IP 0 x j 1 O(N 2 ) 1 1 k 2 P k (x i) = w ij(p k 1 (x j) C k 1 (x i, x j)) j N(x i ) (2) P k (x i ) k x i P k 1 (x j) k 1 x i x j C k 1 (x i, x j ) k 1 x i x j C k 1 x i x j k x j x i 3. 2 EPTP PTP [9] (Malicious ) {malicious} P (x malicious) = γ PTP G x i 1 P (x i ; G) = w ij P (x j x i = 0; G) (1) j N(x i ) 1 N(x i ) x i w ij i j P (x j x i = 0; G) x i 2 k C k 1 3 C k 1 (x i, x j ) = w ji (P k 2 (x i )) (3) w ij 4 w ij = 1 N(x i ) e ij E 0 e ij E (4) PTP Alexa [1] DNS Alexa {benign}
20 Require: W, {malicious}, {benign}, γ, β 1: P α N, P (malicious) γ, P (benign) β, C 0 N N 2: repeat 3: T W diag(p ) 4: C T W C T 5: P < C, 1 > 6: C(malicious, ) 0, C(benign, ) 0 7: P (malicious) γ, P (benign) β 8: until P has converged 9: return P (A(i, j) B(i, j)) {malicious} P (malicious) γ {benign} P (benign) β 0 0 T W diag(p ) (diag(p ) P N N ) T W C T 2 2 Extended Probability Threat Propagation C C P =< C, 1 > {benign} Benign Malicious 1 PTP Malicious γ [0,1] 0 EPTP β > 0 γ < 0 α = γ+β 2 P (x) γ P (x) < α P (x) γ α < P (x) β P (x) β Benign β = 1 Malicious γ = β = 1 α = γ+β 2 = 0 ( 2) EPTP 2 N P R N (P (i) = P (x i )) W R N N T R N N (T (i, j) = W (i, j) P (j)) C R N N, C = T W C T A B 1 EPTP (<, > 1 N 1 ) P {malicious} {benign} P C x x {malicious}, {benign} EPTP 1 P 0.001 P x k PTP 100 50 4 4. 1 DNS DNS 1 DNS tcpdump UDP port 53 2013 11 5 28
Vol. 33 No. 3 Aug. 2016 21 Total Number (x1000) 1400 1200 1000 800 600 400 200 0 IN domains OUT domains KEEP domains Total Number (x1000) 14 12 10 8 6 4 2 0 IN domains OUT domains KEEP domains Total Number (x1000) 1.5 1 0.5 0 IN domains OUT domains KEEP domains Total Number (x1000) 1.2 1 0.8 0.6 0.4 0.2 0 IN domains OUT domains KEEP domains -200 11/05 11/12 11/19 11/26-2 11/05 11/12 11/19 11/26-0.5 11/05 11/12 11/19 11/26-0.2 11/11 11/18 11/25 Time (JST 2013) Time (JST 2013) Time (JST 2013) Time (JST 2013) (a) (b) Benign (c) Malicious (d) Suspicious 3 IN KEEP OUT 24 DNS (A ) IP DNS 1,348,547, IP 2,417,727 3,917,402 4. 2 DNS (Malicious) malwaredomains.com [18] uribl.com [21] (Malicious ) (Benign) Alexa [1] (Alexa ) 30,000 30,653 Alexa DGA 5 DNS DNS 5. 1 DNS 1 DNS k k 1 KEEP k 1 k OUT k 1 k IN DNS (All) (Malicious) (Benign) 3 3 1 DNS IN KEEP OUT 3 (a) (c) 30,000 30,653 KEEP IN OUT 1 65% KEEP KEEP ( 71.8%) IN OUT ( 51.4%)
22 1 KEEP IN OUT KEEP IN OUT ALL 65.9% 17.1% 17.1% Benign 71.8% 14.1% 14.1% Malicious 48.6% 25.8% 25.6% Suspicious 65.3% 17.4% 17.2% 2 DNS d = 1 d = 24 1,271,975 3,766,274 668,266 1,348,547 IP 603,709 2,417,727 1,199,612 3,917,402 1.88 2.08 203,670 356,676 5. 2 DNS 2013/11/05 1 (d = 1) 2013/11/05 28 24 (d = 24) 2 DNS d=1 DNS 127 120 20 DNS d = 24 d=1 DNS 2.96 2 IP 4 3.27 1.75 DNS d= 24 DNS 4 75% IP 1 1 d=24 70% Frequency 10 6 10 5 10 4 10 3 10 2 10 1 10 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 4 Number of nodes 3 10 (d = 24) No. IP Benign Malicious 1 778,279 1,832,123 5664 471 2 1 19,929 0 0 3 1 1,563 0 0 4 1,545 2 0 0 5 1,450 20 0 1 6 1 1,457 0 0 7 22 1,407 0 0 8 2 1,424 0 0 9 3 1,384 0 0 10 1,345 41 0 1 90% 6 10 3 (d=24) 1 IP IP 2 spmode.ne.jp pandaworld.ne.jp 1
Vol. 33 No. 3 Aug. 2016 23 4 d=1 d=24 567,663 2,610,402 349,581 778,279 IP 218,082 1,832,123 688,706 3,095,916 1.88 2.37 28 26 5 (%) (%) ave med B B 912,073 14.4 5,422,947 85.6 6.5 6 M M 13,743 5.1 258,210 94.9 8.0 8 B M 343,879 6.5 4,911,630 93.5 8.4 8 U U 2,738,310 30.5 6,226,634 69.5 8.3 8 IP 4 DNS 20 2 Frequency 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 Benign-Benign Malicious-Malicious Benign-Malicious Unknown-Unknown 0.00 0 2 4 6 8 10 12 14 16 18 20 22 24 Node distance d=24 d=1 5 4.6 4.5 DNS (d=1 2.96 3.27 ) 3 d=1 d=24 2 (B B) 6.5 4 (B M) 8.4 (M M) 8.0 8 5 2 15% IP (B B) 5. 3 d=1 Benign Malicious Unknown 5 5 B B, M M, B M, U U Benign Malicious Benign Malicious Unknown (B M) Unknown (U U) B M 6 EPTP EPTP PTP, N-gram EPTP DNS
24 Ratio 1.0 0.8 0.6 0.4 0.2 TPR 0.0 FPR -1.0-0.5 0.0 0.5 1.0 6 Threshold True Positive Rate 7 1.00 0.80 0.60 0.40 10-cv-Original PTP 10-cv-Extended PTP 0.20 5-cv-Original PTP 5-cv-Extended PTP 0.00 Bigram-based 0.00 0.01 0.02 0.03 0.04 0.05 False Positive Rate ROC (k-fold cross validation) 6. 1 EPTP 2,000, 1,973 k-fold cross validation (CV) (k = 5, 10) DNS 1 10-fold CV P (x) TPR (True Positive Rate) FPR (False Positive Rate) 6 TPR 0.5 0 τ = 0 TPR 90% FPR τ = 0 FPR 0 0.5 TLD biz net info SLD DGA Receiver Operatorating Characteristic (ROC) (EPTP) (PTP) N-gram [17] [26] 7 10-fold CV ROC EPTP FPR 0.016 PTP TPR 0.827 Time (s) / Memory (MiB) 10 4 10 3 10 2 10 1 10 0 10-1 8 Memory (MiB) Time (s) 10 3 10 4 10 5 10 6 Number of nodes EPTP EPTP TPR 0.904 5-fold CV 10-fold CV ROC 10-fold CV N-gram 40% 1,000 10,000 100,000 1,000,000 EPTP 8 (CPU: Intel Xeon X5675 3.07GHz; Memory 32GB) 10
Vol. 33 No. 3 Aug. 2016 25 6 4c4brcwmg.biz -0.978 info-ezweb-ne-jp.info -0.75 poohpoohhany.info -0.680 bvncm-kdkdkgree.jp -0.538 nomoguz.su -0.344 kisjehmbga.jp -0.241 google-play.jp -0.2 9 IP EPTP IP 1 IP IP IP 7 9 ( :, : IP : ) 6. 2 DNS d=24 DNS EPTP 6 FPR 0 τ = 0.1 0.1 IP 2,170 IP 12,884 6 EPTP (τ = 0.1) 9 7. 1 IN KEEP OUT 6. 2 EPTP Suspicious IN KEEP OUT ( 3 (d) 1) Suspicious ALL IN KEEP OUT IN KEEP OUT Suspicious Malicious DGA 6 Suspicious DGA DNS 5 5 2
26 PTP 4 7. 2 DNS DNS IP DNS, d=24 377 392 DNS DNS DNS 69% 1 IP IP 7. 3 EPTP [9] PTP Alexa 10-fold CV FPR (=0.016) 8% 90.4% ( 7) EPTP 6 τ = 0 DNS 8 10 DNS 5-fold CV 10-fold CV 6 τ 0.1 2,170 ( 6) DGA google-play.jp N-gram EPTP ( 9) IP IP
Vol. 33 No. 3 Aug. 2016 27 8 DNS A PTP DNS 1 EPTP DNS 90.4% 2,170 Malicious Alaxa (15H02699) (EU) FP7 ( 608533:NECOMA) [ 1 ] Alexa: Alexa [Online], http://www.alexa.com/ topsites/. [ 2 ] Antonakakis, M., Perdisci, R., Dagon, D., Lee, W. and Feamster, N.: Building a Dynamic Reputation System for DNS, in Proceedings of USENIX security symposium, 2010, pp. 273 290. [ 3 ] Antonakakis, M., Perdisci, R., Lee, W., Vasiloglou II, N. and Dagon, D.: Detecting Malware Domains at the Upper DNS Hierarchy, in Proceedings of USENIX Security Symposium, 2011, pp. 411 426. [ 4 ] Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou II, N., Abu-Nimeh, S., Lee, W. and Dagon, D.: From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware, in Proceedings of USENIX Security Symposium, 2012, pp. 491 506. [ 5 ] Arkko, J., Cotton, M. and Vegoda, L.: IPv4 Address Blocks Reserved for Documentation, RFC 5737, January 2010. [ 6 ] Bar, A., Paciello, A. and Romirer-Maierhofer, P.: Trapping botnets by DNS failure graphs: validation, extension and application to a 3G network, in Proceedings of TMA 13, IEEE, 2013, pp. 393 398. [ 7 ] Bilge, L., Sen, S., Balzarotti, D., Kirda, E. and Kruegel, C.: EXPOSURE: a passive DNS analysis service to detect and report malicious domains, ACM Transactions on Information and System Security (TISSEC), Vol. 16, No. 4(2014), p. 14. [ 8 ] Brownlee, N., Claffy, K. and Nemeth, E.: DNS measurements at a root server, in Proceedings of GLOBECOM 01, Vol. 3, IEEE, 2001, pp. 1672 1676. [ 9 ] Carter, K. M., Idika, N. and Streilein, W. W.: Probabilistic threat propagation for malicious activity detection, in Proceedings of ICASSP 13, IEEE, 2013, pp. 2940 2944. [10] Castro, S., Wessels, D., Fomenkov, M. and Claffy, K.: A Day at the Root of the Internet, ACM SIGCOMM Computer Communication Review, Vol. 38, No. 5(2008), pp. 41 46. [11] Chau, D., Nachenberg, C., Wilhelm, J., Wright, A. and Faloutsos, C.: Polonium: Tera-scale graph mining and inference for malware detection, in Proceedings of SIAM International Conference on Data Mining, Vol. 2, 2011. [12] Gao, H., Yegneswaran, V., Chen, Y., Porras, P., Ghosh, S., Jiang, J. and Duan, H.: An empirical reexamination of global DNS behavior, in Proceedings of SIGCOMM 13, ACM, 2013, pp. 267 278. [13] Hao, S., Feamster, N. and Pandrangi, R.: Monitoring the initial DNS behavior of malicious domains, in Proceedings of IMC 11, ACM, 2011, pp. 269 278. [14] Ishibashi, K. and Sato, K.: Classifying DNS Heavy User Traffic by using Hierarchical Aggregate Entropy, in Proceedings of World Telecommunications Congress (WTC 12), 2012, pp. 1 6. [15] Ishibashi, K., Toyono, T., Hasegawa, H. and Yoshino, H.: Extending black domain name list by using co-occurrence relation between DNS queries, IEICE Transactions on Communications, Vol. 95, No. 3(2012), pp. 794 802. [16] Jiang, N., Cao, J., Jin, Y., Li, L. E. and Zhang, Z.-L.: Identifying suspicious activities through DNS failure graph analysis, in Proceedings of ICNP 10, IEEE, 2010, pp. 144 153. [17] Kazato, Y., Fukuda, K. and Sugawara, T.: To-
28 wards classification of DNS erroneous queries, in Proceedings of AINTEC 13, ACM, 2013, pp. 25 32. [18] Malware Domain Blocklist: DNS-BH Malware Domain Blocklist, http://www.malwaredomains. com/. [19] Manadhata, P. K., Yadav, S., Rao, P. and Horne, W.: Detecting Malicious Domains via Graph Inference, in Proceedings of ESORICS 14, Springer, 2014, pp. 1 18. [20] Osterweil, E., McPherson, D., DiBenedetto, S., Papadopoulos, C. and Massey, D.: Behavior of DNS Top Talkers, a. com/. net View, in Proceedings of PAM 12, Springer, 2012, pp. 211 220. [21] P. Vixie: Traltime URI Blacklist, http://uribl. com/. [22] Perdisci, R., Corona, I., Dagon, D. and Lee, W.: Detecting malicious flux service networks through passive analysis of recursive DNS traces, in Proceedings of ACSAC 09, IEEE, 2009, pp. 311 320. [23] Schiavoni, S., Maggi, F., Cavallaro, L. and Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence, in Proceedings of DIMVA 14, Springer, 2014, pp. 192 211. [24] Wessels, D. and Fomenkov, M.: Wow, that s a lot of packets, in Proceedings of PAM 03, 2003. [25] Xuebiao, Y., Xin, W., Xiaodong, L. and Baoping, Y.: DNS measurements at the.cn TLD servers, in Proceedings of FSKD 09, Vol. 7, 2009, pp. 540 545. [26] Yadav, S., Reddy, A., Reddy, A. and Ranjan, S.: Detecting algorithmically generated malicious domain names, in Proceedings of IMC 10, ACM, 2010, pp. 48 61. [27] Yadav, S. and Reddy, A. N.: Winning with DNS failures: Strategies for faster botnet detection, in Proceedings of SecureCom 12, Springer, 2012, pp. 446 459. [28],,, : DNS,. IOT, No. 21(2009), pp. 19 24. 2015 1999 ( ( )) 1999 2005 2006 (2002) (2008 2012; ) / (2014 2015) 1982.,. 1992 1993,.,.,,,,. ( ).,,,, ISOC IEEE, ACM, AAAI.