Drive-by-Download JavaScript

Similar documents
29 jjencode JavaScript

Copyright 2008 by Tomoyoshi Yamazaki

JAIST Reposi Title KJ 法における作法の研究 Author(s) 三村, 修 Citation Issue Date Type Thesis or Dissertation Text version author URL http

2015 9

Web


Copyright c 2001 by Shuuhei Takimoto

レビューテキストの書き の評価視点に対する評価点の推定 29 3


情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-DPS-158 No.21 Vol.2014-CSEC-64 No /3/6 文字出現頻度をパラメータとした機械学習による悪質な難読化 JavaScript の検出 1a) 西田雅太 2d) 衛


Copyright c 2000 by Yoshihide Tomiyama

2 [2] Flow Visualizer 1 DbD 2. DbD [4] Web (PV) Web Web Web 3 ( 1) ( 1 ) Web ( 2 ) Web Web ( 3 ) Web DbD DbD () DbD DbD DbD 2.1 DbD DbD URL URL Google



Copyright ' 2001 by Manabu Masuoka i


JAIST Reposi Title 既存曲に合わせて口す さまれる即興歌唱を利用した 音楽創作支援手法に関する研究 Author(s) 柳, 卓知 Citation Issue Date Type Thesis or Dissertation Te

2006 3



AI

Twitter‡Ì”À‰µ…c…C†[…g‡ðŠŸŠp‡µ‡½…^…C…•…›…C…fi‘ã‡Ì…l…^…o…„‘îŁñ„�™m


2015 3

TRON Copyright C 2002 by KURATA Keiicchi

Twitter ( ), ( ). i

1,a) 1,b) TUBSTAP TUBSTAP Offering New Benchmark Maps for Turn Based Strategy Game Tomihiro Kimura 1,a) Kokolo Ikeda 1,b) Abstract: Tsume-shogi and Ts

24 SPAM Performance Comparison of Machine Learning Algorithms for SPAM Discrimination


..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

Copyright 2001 by Junichi Sawase

(a) (b) 1 JavaScript Web Web Web CGI Web Web JavaScript Web mixi facebook SNS Web URL ID Web 1 JavaScript Web 1(a) 1(b) JavaScript & Web Web Web Webji


1 3 [1] [2, 3] WWW 2.1 WWW WWW DjVu 3 ( 1) 2 DjVu DjVu DjVu[2] 16 ( ) http

1 Gumblar Fig. 1 Flow of Gumblar attack. Fig. 2 2 RequestPolicy Example of operation based on RequestPolicy. (3-b) (4) PC (5) Web Web Web Web Gumblar

2005 3

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

Landing Landing Intermediate Exploit Exploit Distribution Provos [1] Drive-by Download (Exploit Distribution ) Drive-by Download (FCDBD: Framework for




1

XSS Cross-Site Scripting [1] SQL [2], [3], [4] Web [5] Web XSS SQL 1 WAF Web Application Firewall [6] WAF Web Web HTTP WAF HTTPS WAF False Positive 1

1.

Vol.58 No

2004 3

bag-of-words bag-of-keypoints Web bagof-keypoints Nearest Neighbor SVM Nearest Neighbor SIFT Nearest Neighbor bag-of-keypoints Nearest Neighbor SVM 84

,,, Twitter,,, ( ), 2. [1],,, ( ),,.,, Sungho Jeon [2], Twitter 4 URL, SVM,, , , URL F., SVM,, 4 SVM, F,.,,,,, [3], 1 [2] Step Entered




' ' ' '

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-


文を綴る、文を作る



stud 戸 時 of 血 e~ 田 e 置 'Ch

24 Region-Based Image Retrieval using Fuzzy Clustering




227 study

MDA

2014 3


(MIRU2008) HOG Histograms of Oriented Gradients (HOG)


Copyright c 2012 by Kikugawa Mariko

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS






形容詞的過去分詞(Adjectival Past Participle)の選択束縛について




古代東国と「譜第」意識

表象される奈良: B面の「なら学」のために






Sobel Canny i

3年生における国語表現指導


& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro


Wikipedia 2 Wikipedia Web Wikipedia 2. Web [6] [11] [8] 2 SVM Bollegala [1] 5-gram URL URL 2-gram [6] [11] SVM 3 SVM [8] Bollegala [1] SVM [7] [9] [6]

1 AND TFIDF Web DFIWF Wikipedia Web Web AND 5. Wikipedia AND 6. Wikipedia Web Ma [4] Ma URL AND Tian [8] Tian Tian Web Cimiano [3] [


IPSJ SIG Technical Report Vol.2010-SLDM-144 No.50 Vol.2010-EMB-16 No.50 Vol.2010-MBL-53 No.50 Vol.2010-UBI-25 No /3/27 Twitter IME Twitte

中国における初現期の都市・都市形成の4段階


(1) a. He has gone already. b. He hasn't gone yet. c. Has he gone yet?

Transcription:

JAIST Reposi https://dspace.j Title Drive-by-Download 攻撃予測のための難読化 JavaScript の検知に関する研究 Author(s) 本田, 仁 Citation Issue Date 2016-03 Type Thesis or Dissertation Text version author URL http://hdl.handle.net/10119/13608 Rights Description Supervisor: 面和成, 情報科学研究科, 修士 Japan Advanced Institute of Science and

Drive-by-Download JavaScript 1410039 28 2

Drive-by-Download (DbD ) DbD web web DbD JavaScript DbD JavaScript JavaScript JavaScript Support Vector Machine JavaScript JavaScript JavaScript 3 D3M dataset bigram

1 1 1.1....................................... 1 1.2....................................... 1 1.3................................... 2 2 3 2.1 Drive-by-Download............................ 3 2.2 JavaScript................................ 3 2.3 D3M dataset.................................. 4 2.4 K K-Fold Cross Validation................. 5 2.5..................................... 5 2.5.1 Support Vector Machine (SVM).................... 6 2.5.2 Naive Bayes............................... 6 2.5.3 k-nearest Neighbor........................... 7 2.5.4 Decision Tree.............................. 7 2.5.5 Random Forest............................. 7 2.6................................... 8 2.6.1 Python[10]................................ 8 2.6.2 scikit-learn[11].............................. 8 2.6.3 HTMLParser[4]............................. 8 2.7............................. 9 2.8 unigram, bigram........................... 10 3 11 3.1......................... 11 3.2........................... 11 3.2.1 XZZ 13[12]................................ 11 3.2.2 JCB 14[7]................................ 12 3.3........................... 13 3.3.1 LJJ 09[8]................................. 13 3.3.2 CCVK 11[3]............................... 13 3.3.3 NHKEIN 14[9].............................. 17 i

3.4............. 17 3.4.1 AO 15[1]................................. 17 3.5...................... 17 4 [9] 19 4.1 [9].............................. 19 4.2 [9]..................... 19 4.3..................................... 20 4.3.1 JavaScript.................... 20 4.3.2 SVM............................ 21 4.3.3................................. 22 4.4...................................... 22 5 23 5.1............................ 23 5.2 JavaScript............. 23 5.2.1 JavaScript............................. 23 5.2.2 JavaScript............................. 23 5.2.3 JavaScript.................. 24 5.3..................................... 24 5.3.1.............. 25 5.3.2 SVM............... 25 5.4..................................... 26 5.5....................................... 27 6 bigram 28 6.1 bigram..................... 28 6.2....................................... 28 7 30 8 31 ii

1 1.1 Drive-by- Download (DbD ) DbD web web Bot IBM 2014 21.9% 11.3% DbD [5] IPA 10 2015[6] DbD DbD DbD web JavaScript JavaScript JavaScript DbD 1.2 DbD JavaScript DbD 2 JavaScript [9] JavaScript JavaScript JavaScript Support Vector Machine 1

JavaScript JavaScript 3 D3M dataset [9] bigram 1.3 2 3 DbD 4 [9] 5 6 bigram 7 2

2 2.1 Drive-by-Download Drive-by-Download (DbD ) web web DbD DbD 2.1 1. web 2. 3. 4. DbD web JavaScript 2.2 JavaScript DbD JavaScript JavaScript JavaScript unescape() String.replace() String.charAt() eval() 2.2 JavaScript JavaScript unescape() eval() 3

2.1: DbD 2.3 D3M dataset 2.2: JavaScript D3M dataset MWS MWS datasets DbD D3M dataset DbD 3 1. URL DbD pcap 2. 3. pcap 4

2.3: 5 2.4 K K-Fold Cross Validation K K K 1 K-1 K 1 K 2.3 5 (K=5) 2.5 5

2.4: Support Vector Machine 2.5.1 Support Vector Machine (SVM) SVM ( 2.4) SVM 2 2.5.2 Naive Bayes Naive Bayes Naive Bayes Gaussian Naive Bayes Bernoulli Naive Bayes Multinomial Naive Bayes Gaussian Naive Bayes 6

2.5.3 k-nearest Neighbor 2.5: k-nearest Neighbor k-nearest Neighbor k 2.5 k=3 Class2 k=5 Class1 k=7 Class2 2 2 2.5.4 Decision Tree C4.5 CART C4.5 CART 2.5.5 Random Forest Decision Tree 1 7

2.6 2.6.1 Python[10] Python Python 2.6.2 scikit-learn[11] scikit-learn Python SVM Decision Tree Random Forest k-nearest Neighbor Naive Baise 2 scikit-learn 2.6.3 HTMLParser[4] HTMLParser Python HTML HTMLParser HTMLParser HTML JavaScript HTMLParser 8

2.7 JavaScript 4 True Positive (TP) True Negative (TN) False Positive (FP) False Negative (FN) 4 Accuracy Accuracy T P + T N Accuracy = (2.1) T P + T N + F P + F N Precision Precision ( ) T P P recision = (2.2) T P + F P Recall Recall ( ) T P Recall = (2.3) T P + F N False Negative Rate (FNR) FNR ( ) 9

F NR = F N T P + F N (2.4) Recall (2.3) F NR = 1 Recall Recall FNR False Positive Rate (FPR) FPR ( ) F P R = F P T N + F P (2.5) (2.1) (2.2) (2.3) (2.4) (2.5) T P, T N, F P, F N Accuracy Precision FNR FPR 2.8 unigram, bigram unigram 1 unigram 1 1 bigram 2 bigram 2 1 N 1 N-gram N-gram N N-gram N 1 m N-gram m N 10

3 3.1 DbD 2 1. 2. ( ) 3.2 3.2.1 XZZ 13[12] [12] JavaScript 11

3 1. unescape() 2. JavaScript 3. JavaScript unescape() eval () 3 1 3.2.2 JCB 14[7] [7] JavaScript JavaScript N-gram Support Vector Machine 3.1 1. web JavaScript JavaScript 2. 3. N-gram N N 12

4. N-gram σ σ N 5. SVM 3.3 3.3.1 LJJ 09[8] [8] JavaScript JavaScript JavaScript eval 50 3.1 15 65 3.3.2 CCVK 11[3] [3] FNR [3] 3 DbD 1. HTML iframe HTML HTML URL 19 2. JavaScript eval() settimeout() setinterval() 25 3. URL URL URL IP DNS A DNS NS 33 3 3 1 FPR FNR 13

3.1: [7] 14

Feature 3.1: [8] Description Length in characters The length of the script in characters. Avg. characters per line The average number of characters on each line. Num. of lines The number of newline characters in the script. Num. of strings The number of strings in the script. Num. of unicode symbols The number of unicode characters in the script. hex or octal numbers A count of the numbers represented in hex or octal. % human readable We judge a word to be readable if it is > 70% alphabetical, has 20% < vowels < 60%, is less than 15 characters long, and does not contain > 2 repetitions of the same character in a row. % whitespace The percentage of the script that is whitespace. Num. of methods called The number of methods invoked by the script. Avg. string length The average number of characters per string in the script. Avg. argument length The average length of the arguments to a method, in characters. Num. of comments The number of comments in the script. Avg. comments per line The number of comments over the total number of lines in the script. Num. of words The number of words in the script where words are delineated by whitespace and JavaScript symbols (for example, arithmetic operators). % word not in comments The percentage of words in the script that are not commented out. 15

3.2: [1] 16

3.3.3 NHKEIN 14[9] [9] JavaScript 4 3.4 3.4.1 AO 15[1] [1] URL [7] JavaScript N-gram 2 DbD 3.2 3.5 3.2 JavaScript [9] JavaScript [9] 3 17

/ 3.2: LJJ 09[8] JavaScript CCVK 11[3] HTML JavaScript URL 3 NHKEIN 14[9] JavaScript XZZ 13[12] JCB 14[7] JavaScript N-gram JavaScript AO 15[1] JavaScript 18

4 [9] 4.1 [9] 2.2 JavaScript JavaScript JavaScript [9] JavaScript [9] JavaScript 94 ASCII ASCII 0x21 0x7e i JavaScript m i JavaScript N F (i) N = i m i (4.1) F (i) = m i N (4.2) F (i) 0 F (i) 1 F (i) Support Vector Machine SVM 94 1 1 unigram [9] JavaScript UTF-8 4.2 [9] [9] JavaScript 19

JavaScript JavaScript JavaScript [9] JavaScript JavaScript JavaScript JavaScript [9] JavaScript JavaScript JavaScript JavaScript JavaScript JavaScript Document Object Model (DOM) HTML HTML DOM JavaScript JavaScript 4.3 4.3.1 JavaScript [9] JavaScript JavaScript JavaScript JavaScript Alexa[2] 500 <script> JavaScript <script> 20

4.1: [9] JavaScript 2013/11/18 2011/2/8-2013/2/26 2786 330 src URL URL JavaScript 4.2 JavaScript 1KB JavaScript [9] JavaScript JavaScript JavaScript D3M dataset 2011 2013 3 [9] JavaScript jquery JavaScript JavaScript 1KB [9] JavaScript JavaScript 4.1 4.3.2 SVM SVM [9] SVM libsvm SVM 1. 4.2 F (i) 94 SVM 94 JavaScript JavaScript JavaScript +1 JavaScript 1 21

4.2: [9] Result Accuracy 98.84% Precision 97.72% Recall 94.35% 2. SVM SVM RBF SVM C γ [9] libsvm grid.py 2 5 4.3.3 [9] C = 25.22 γ = 55.72 Accuracy 4.2 4.4 [9] MWS MWS datasets D3M dataset 3 1 SVM 1 SVM 2 [9] 22

5 5.1 4.4 [9] [9] 5.1 5.2 JavaScript JavaScript [9] JavaScript JavaScript JavaScript 5.2.1 JavaScript Alexa Top500[2] web URL <script> JavaScript <script> src URL JavaScript JavaScript Python JavaScript URL JavaScript Python 5.2 [9] 1KB 5.2.2 JavaScript 2011-2014 D3M dataset JavaScript JavaScript 1KB 23

5.1: 5.1: JavaScript 2015/6/9 2011/2/14-2014/4/11 3344 906 5.2.3 JavaScript JavaScript JavaScript JavaScript JavaScript 5.1 5.3 [9] Support Vector Machine (SVM) Naive Bayes (NB) k-nearest Neighbor (knn) Decision Tree (DT) Random Forest (RF) 5 knn k 1 3 5 7 9 5 5 24

5.2: JavaScript [9] 2 5.3.1 JavaScript 2011 2012 2014 5.3.2 SVM SVM [9] RBF C γ 2 25

5.2: ( ) Accuracy Precision FNR FPR NB 0.9247 0.8369 0.1954 0.04276 SVM (C = 329.9, γ = 61.7) 0.9638 0.9226 0.09384 0.02064 DT 0.9548 0.8711 0.07394 0.03738 RF 0.9769 0.9822 0.09162 0.004485 k=1 0.9616 0.9159 0.09715 0.02243 k=3 0.9546 0.9424 0.1612 0.01405 knn k=5 0.9576 0.9602 0.1634 0.009569 k=7 0.9572 0.9585 0.1645 0.009868 k=9 0.9544 0.9578 0.1777 0.009868 5.3: Accuracy Precision FNR FPR NB 0.8960 0.8988 0.2618 0.03589 SVM (C = 501.0, γ = 5.92) 0.8480 0.9453 0.4737 0.01316 DT 0.7916 0.8608 0.6316 0.02572 RF 0.7414 0.9558 0.8504 0.002990 k=1 0.7385 0.8582 0.8407 0.01136 k=3 0.7410 0.9180 0.8449 0.005981 knn k=5 0.7544 0.9408 0.8019 0.005383 k=7 0.7410 0.8984 0.8407 0.007775 k=9 0.7410 0.8592 0.8310 0.01196 FNR FNR FNR 5.4 5.2 5.3 26

5.5 5.2 FNR DT RF SVM Accuracy RF SVM DT SVM RF RF SVM Accuracy DT 3 knn FNR NB FNR Accuracy ( 5.3) FNR RF knn 0.8 (80%) FNR DT FNR 0.6316 SVM FNR 0.4737 Naive Bayes FNR 0.2618 74% Accuracy Naive Bayes 0.8960 Naive Bayes FNR FPR 3.6% Naive Bayes [9] SVM 4.4 SVM SVM SVM 27

6 bigram 6.1 bigram bigram SVM 5.3 bigram 94 2 = 8836 1767 2 1767 4418 5 4418 2 1767 6.1 4418 6.2 6.2 1767 6.1 unigram 5.3 NB SVM RF FNR knn FNR k 0.69 (69%) DT FNR 1767 FNR NB 0.4571 5.3 NB 0.2618 4418 6.2 unigram 5.3 NB SVM RF FNR knn FNR k 0.72 (72%) 1767 DT FNR 4418 FNR NB 0.4460 1767 5.3 NB bigram JavaScript 28

6.1: bigram (1767 ) Accuracy Precision FNR FPR NB 0.8438 0.8991 0.4571 0.02632 SVM (C = 60.0, γ = 5.0) 0.8250 0.9749 0.5693 0.004785 DT 0.8396 0.9183 0.4861 0.01974 RF 0.7339 0.9670 0.8781 0.001794 k=1 0.7715 0.9187 0.7341 0.01017 k=3 0.7853 0.9444 0.6939 0.007775 knn k=5 0.7857 0.9563 0.6967 0.005981 k=7 0.7853 0.9602 0.6994 0.005383 k=9 0.7861 0.9605 0.6967 0.005383 6.2: bigram (4418 ) Accuracy Precision FNR FPR NB 0.8404 0.8696 0.4460 0.03589 SVM (C = 216, γ = 2.3) 0.7682 0.9563 0.7576 0.004785 DT 0.8371 0.9171 0.4945 0.01974 RF 0.7410 0.9636 0.8532 0.002392 k=1 0.7623 0.8964 0.7604 0.01196 k=3 0.7740 0.9209 0.7258 0.01017 knn k=5 0.7749 0.9463 0.7313 0.006579 k=7 0.7753 0.9466 0.7299 0.006579 k=9 0.7757 0.9426 0.7271 0.007177 1767 4418 NB RF FNR 94 2 = 8836 29

7 DbD JavaScript [9] [9] D3M dataset SVM Naive Bayes Naive Bayes bigram Naive Bayes 74% 30

8,, Drive-by-Download JavaScript, The 33rd Symposium on Cryptography and Information Security (SCIS 2016), 2016. 31

32

[1] Takashi Adachi and Kazumasa Omote, An Approach to Predict Drive-by-Download Attacks by Vulnerability Evaluation and Opcode, The 10th Asia Joint Conference on Information Security (AsiaJCIS 2015), 2015. [2] Alexa Top 500 Global Sites, www.alexa.com/topsites [3] Davide Canali, Marco Cova, Giovanni Vigna, Christopher Kruegel, Prophiler: a fast filter for the large-scale detection of malicious web pages, The 20th international conference on World wide web, 2011. [4] HTMLParser, https://docs.python.org/2.7/library/htmlparser.html [5] IBM, 2014 Tokyo SOC, www-935.ibm.com/services/jp/ja/it-services/soc-report/ [6], 10 2015, www.ipa.go.jp/security/vuln/10threats2015.html [7] G.K. Jayasinghe, J.S. Culpepper, P. Bertok, Efficient and effective realtime prediction of drive-by download attacks, Journal of Network and Computer Applications Volume 38, February 2014. [8] P. Likarish, E. Jung, I. Jo Obfuscated malicious javascript detection using classification techniques, MALWARE 09, 2009. [9],,,,,, JavaScript, 2014-CSEC-64, vol.21, pp. 1-7, 2014. [10] Python, https://www.python.org/ [11] scikit-learn, http://scikit-learn.org/stable/ [12] W. Xu, F. Zhang, S. Zhu, Jstill: Mostly static detection of obfuscated malicious javascript code, CODASPY 13, 2013. 33