26 Feature Extraction with Randomness for an Application to Machine Learning from Text Data

Similar documents
24 SPAM Performance Comparison of Machine Learning Algorithms for SPAM Discrimination

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

3. ( 1 ) Linear Congruential Generator:LCG 6) (Mersenne Twister:MT ), L 1 ( 2 ) 4 4 G (i,j) < G > < G 2 > < G > 2 g (ij) i= L j= N

28 TCG SURF Card recognition using SURF in TCG play video

soturon.dvi

Web Web Web Web Web, i

Sobel Canny i

29 jjencode JavaScript

24 Region-Based Image Retrieval using Fuzzy Clustering

kut-paper-template.dvi

Q-Learning Support-Vector-Machine NIKKEI NET Infoseek MSN i

[1] SBS [2] SBS Random Forests[3] Random Forests ii

2017 (413812)

2 ( ) i

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

25 Removal of the fricative sounds that occur in the electronic stethoscope

4.1 % 7.5 %

23 The Study of support narrowing down goods on electronic commerce sites

28 Horizontal angle correction using straight line detection in an equirectangular image

GPGPU

23 A Comparison of Flick and Ring Document Scrolling in Touch-based Mobile Phones

johnny-paper2nd.dvi

7,, i

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

SC-85X2取説


<4D F736F F F696E74202D C835B B E B8CDD8AB B83685D>

1 1 tf-idf tf-idf i

浜松医科大学紀要

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

29 Short-time prediction of time series data for binary option trade

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

SURF,,., 55%,.,., SURF(Speeded Up Robust Features), 4 (,,, ), SURF.,, 84%, 96%, 28%, 32%.,,,. SURF, i

,,,,., C Java,,.,,.,., ,,.,, i

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

卒業論文2.dvi

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

IT,, i

活用ガイド (ソフトウェア編)

58 10

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

22 Google Trends Estimation of Stock Dealing Timing using Google Trends

IT i

III


[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

II

これわかWord2010_第1部_ indd

パワポカバー入稿用.indd

これでわかるAccess2010

LAN LAN LAN LAN LAN LAN,, i

Web Web Web Web i

24 Perceived depth position of autostereoscopic stimulus

untitled

i

2

kut-paper-template.dvi

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

DEIM Forum 2009 E

平成18年版 男女共同参画白書

活用ガイド (ソフトウェア編)

Web Web ID Web 16 Web Web i

A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in

Web Basic Web SAS-2 Web SAS-2 i

Statistical inference for one-sample proportion

27 VR Effects of the position of viewpoint on self body in VR environment


1 Web Web 1,,,, Web, Web : - i -

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-CVIM-186 No /3/15 EMD 1,a) SIFT. SIFT Bag-of-keypoints. SIFT SIFT.. Earth Mover s Distance

P2P P2P peer peer P2P peer P2P peer P2P i

25 D Effects of viewpoints of head mounted wearable 3D display on human task performance

aca-mk23.dvi

kut-paper-template2.dvi

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

, IT.,.,..,.. i

橡最新卒論

,,.,,., II,,,.,,.,.,,,.,,,.,, II i

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

Study on Application of the cos a Method to Neutron Stress Measurement Toshihiko SASAKI*3 and Yukio HIROSE Department of Materials Science and Enginee


20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

<95DB8C9288E397C389C88A E696E6462>

3_23.dvi

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

paper.dvi

paper.dvi

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

エクセルカバー入稿用.indd

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

23 Study on Generation of Sudoku Problems with Fewer Clues

gengo.dvi

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

21 Key Exchange method for portable terminal with direct input by user

01_.g.r..


Transcription:

26 Feature Extraction with Randomness for an Application to Machine Learning from Text Data 1175087

Random Forest Random Forest SPAM SPAM600 SPAM1000 SPAM Random Forest i

2 10 2 3% Random Forest Mersenne Twister Low-Discrepancy ii

Abstract Feature Extraction with Randomness for an Application to Machine Learning from Text Data Natsuki Fujimori In text processing, a word vector, which is converted from text document data, is usually used as a feature vector. A word vector is a histogram of word frequency or occurrence of a document. It is a numerical data, however it is not a quantitative data such as ratio scale. It is originally a qualitative data such as ordinal or nominal scale. Some methods for data mining using machine learning employ numeric calculation which can discriminate quantitative data that able to define the distance or inner product of feature vectors, but random forests employing decision trees is machine learning technique which is suitable for qualitative text data because it constructs various shapes of decision trees by choosing predictor values randomly. When choosing predictor values, the random sampling is performed using pseudo-random sequence. However, if the number of tree depth to construct, the number of decision trees to use discriminate, or the number of features predictor values of decision trees are small values, the bias may appear because of non-uniformness of pseudo-random numbers. In order to solve this problem, in this study, we propose an application of quasi-random number generator. quasi-random sequence generates uniform sequence of points. When the number of tree of depth, the number of trees using for discrimination, or the number of features constructing tree are small, the proposed method is able to achieve higher performance than pseudo-random. By using the original Japanese SPAM e-mail dataset (600: SPAM, iii

1000: non-spam), we perform the experiment of conventional and proposed methods. As a result, under the condition that the maximum number of tree is 2, and that the number of feature is under 10, the proposed method improves 2 or 3 percent of the precision. key words Random Forest, Pseudo-Random Sequence, Mersenne Twister, Low- Discrepancy Sequence, Quasi-Random Sequence iv

1 1 2 4 2.1.................................. 4 2.2................................... 6 2.2.1............................. 6 2.2.2................................ 6 2.2.3 Bagging................................ 6 2.2.4 Random Forest............................ 8 3 10 3.1.................................. 10 3.2................................... 11 4 Random Forest 15 4.1 Random Forest............. 15 4.2 Random Forest 16 4.3 Python Random Forest........ 17 4.3.1................ 17 4.3.2............... 19 5 20 5.1................................ 20 5.1.1....................... 20 5.1.2................................ 21 v

5.1.3................................ 22 5.2.................................... 23 5.2.1........................... 23 5.2.2 D max = 1............. 26 5.2.3 D max = 2............. 26 5.2.4 D max = 3 D max = 4... 28 5.2.5 D max = 1............. 31 5.2.6....... 32 5.2.7................................. 34 5.3.......... 37 6 39 42 44 A D max 46 B 6 48 vi

2.1............................ 5 2.2 Bagging......................... 7 2.3 Random Forest..................... 9 3.1 Mersenne Twister 1000*1000......... 13 3.2 Mersenne Twister 10000*10000........ 13 3.3 SOBOL 1000*1000 2............. 14 3.4 SOBOL 10000*10000 2........... 14 4.1 Random Forest....... 18 5.1........................ 22 5.2..... 24 5.3 D max = 1... 25 5.4 D max = 1... 25 5.5 D max = 2... 27 5.6 D max = 2... 27 5.7 D max = 3... 28 5.8 D max = 3... 29 5.9 D max = 4... 30 5.10 D max = 4... 30 5.11 D max = 5... 31 5.12 D max = 5... 31 5.13 D max = 2 F max = 11 20 N max = 11 20.......................... 33 vii

5.14 D max = 2 F max = 11 20 N max = 11 20......................... 33 5.15 D max = 2 F max = 1 10 N max = 11 20............................ 34 5.16 D max = 2 F max = 1 10 N max = 11 20.......................... 34 5.17 D max = 2 F max = 11 20 N max = 1 10............................ 36 5.18 D max = 2 F max = 11 20 N max = 1 10.......................... 36 5.19 rand int................. 37 5.20 SOBOL.................. 37 B.1.............. 48 B.2 6 49 B.3 6 50 viii

5.1.................................... 21 5.2 Random Forest.............. 21 5.3.... 23 A.1 D max = 1......................... 46 A.2 D max = 2......................... 47 ix

1 1970 Web Web Weblog Twitter Random Forest Random Forest 1

Bagging Random Forest Random Forest Low Discrepancy Sequence Random Forest Random Forest [1],[2] SPAM SPAM Random Forest SPAM [1],[2] Random Forest SPAM Random Forest 2 2

3 2 Random Forest 4 Random Forest 5 6 3

2 Bagging Random Forest 2.1 2 2.1 [4] 2.1 1 4

2.1 無料 1 0 出会い 0 1 女性 1 0 HAM SPAM 秘密 HAM 1 0 SPAM HAM 2.1 f f 2 5

2.2 2.2 2.2.1 {x i 1 i N} X 2.2.2 2 Bagging Random Forest 2.2.3 Bagging Bagging Leo Breiman 1996 [6] Bagging Bootstrap Aggregating Bagging 6

2.2 T 1 T m f 1 (x) f m (x) 2.2 Bagging 学習データ ブートストラップサンプリング ブートストラップサンプル 1 ブートストラップサンプル 2 ブートストラップサンプル m 識別識別識別 弱学習器 T 1 T 2 T m f 1 (x) f 2 (x) f m x 弱識別結果弱識別結果弱識別結果 多数決 識別結果 2.2 Bagging 7

2.2 Random Forest 2.2.4 Random Forest Random Forest Leo Breiman 2001 [7] Bagging Random Forest d Breiman [5] Bagging 2.3 Random Forest Bagging Random Forest [5] 8

2.2 学習データ ブートストラップサンプリング 多様な決定木が並列に並ぶ森 ( 弱学習器群 ) ブートストラップサンプル 1 ブートストラップサンプル 2 ブートストラップサンプル m 識別 識別 ノードに利用する特徴をランダムサンプリングで d 個選択する 識別 T 1 T 2 T m f 1 (x) f 2 (x) f m x 弱識別結果弱識別結果弱識別結果 多数決 識別結果 2.3 Random Forest 9

3 k-means Random Forest 3.1 pseudorandom sequence [9] X n+1 = (A X n + B) mod M (3.1) C 10

3.2 M M M Mersenne Twister 1998 [8] M 2 19937 1 623 Mersenne Twister [10] 3.1 3.2 Mersenne Twister 1000*1000 2 10000*10000 2003 Marsaglia Xorshift 2 128 1 [12] XOR scikit-learn Xorshift 3.2 Quasi random numbers Quasi-Monte Carlo method I := {0 x < 1} I n N x i (1 i N) S (f) 1 S N (f) = 1 N 1 i N f (x i ) (3.2) {x i 1 i N} n I n N P N D (P N ) = O ( N 1 (log N) n) 2 1 f 2 D 11

3.2 n Low Discrepancy Sequence [9] quasirandom 1 SOBOL SOBOL 1967 I.M. Sobol [11] 3.3 SOBOL 1000*1000 2 3.4 SOBOL 10000*10000 2 12

3.2 0.0 0.2 0.4 0.6 0.8 1.0 x 3.1 Mersenne Twister 1000*1000 0.0 0.2 0.4 0.6 0.8 1.0 x 3.2 Mersenne Twister 10000*10000 13

3.2 0.0 0.2 0.4 0.6 0.8 1.0 random[,1] 3.3 SOBOL 1000*1000 2 0.0 0.2 0.4 0.6 0.8 1.0 a[,1] 3.4 SOBOL 10000*10000 2 14

4 Random Forest Random Forest Python scikit-learn Random Forest 4.1 Random Forest Random Forest Random Forest 1 Random Forest 15

4.2 Random Forest 4.2 Random Forest Xorshift Random Forest Random Forest Random Forest Random Forest 16

4.3 Python Random Forest 4.3 Python Random Forest 4.3.1 Python scikitlearn Random Forest forest.py Random Forest fit() Random Forest... for i in range(self.n estimators): # (1) seed seed = np.random.randint(max INT) tree = self. make estimator(append=false) # (2) [j, seed] = i4 uniform(0, MAX INT, seed) # (3) tree.set params(random state=j) # tree.set params(random state=random state.randint(max INT)) trees.append(tree)... i4 uniform() sobol seq forest.py 17

4.3 Python Random Forest from sobol seq import * (3) 4.1 1 rf RandomForestClassifier rf = RandomForestClassifier(n_estimator =num, max_features=feat, max_depth=2) rf.fit(traindata, trainlabel) result = rf.predict(testdata) 実行プログラム RandomForest Classifier 関数を rf に代入 訓練データで決定木の森を作成 変更点 forest.py 中の fit 関数 for i in range(self.n_estimators): seed = np.random.randint(max_int) tree = self._make_estimator(append=false) # 追加部分 [ j, seed ] = i4_uniform(0, MAX_INT, seed) # j に準乱数生成 # 以下 2 文のどちらかをコメントアウトして乱数を決定 tree.set_params(random_state=j) # 準乱数をセット tree.set_params(random_state= random_state.randint(max_int)) # 疑似乱数をセット テストデータで識別 木の深さ : 2 木の深さ :2 識別精度 ( % ) 72 70 68 66 64 62 60 1 2 3 4 5 6 7 8 9 10 特徴数 MT? Quasi? 72 識 70 別精 68 度 ( % 66 ) 64 1 2 3 4 5 6 7 8 9 10 決定木の数 MT? Quasi? 4.1 Random Forest n estimator max features max depth fit 18

4.3 Python Random Forest Random Forest fit Mersenne Twister SOBOL result predict fit 4.3.2 scikit-learn Random Forest sklearn/tree Python C Cython tree.so UNIX tree.pyd Windows tree.pyx tree.pxd Cython C tree.c setup.py install.so.pyd tree.pyx rand int rand int xorshift [12] SOBOL SOBOL 19

5 Random Forest SPAM 1600 SPAM 600 SPAM 1000 Random Forest 5.1 5.1.1?? Python 2.7.6 NumPy scikit-learn SOBOL sobol seq.py 2004 2007 SPAM 600 SPAM 1000 52 20

5.1 OS CPU 5.1 Windows 7 Enterprise 4.00 GB Intel(R) Core(TM) i5-2400s CPU @ 2.50GHz Python 2.7.6 scikit-learn NumPy sobol seq SPAM SPAM 600 SPAM 1000 52 50 1550 5.1.2 Random Forest 3 3 5.2 5.2 Random Forest D max 1, 2, 3, 4, 5 F max 1,2,3,...,10 N max 1,2,3,...,10 D max F max N max 5.1 1 5 21

5.1 深さ :1 深さ :2 深さ :3 深さ :4 深さ :5 5.1 6 5 10 10 5.1.3 3 22

5.2 10 2 10 F max N max 10 D max = 5000 2 5.2 5.2.1 5.3 1 260 740 2 695 305 3 535 465 4 546 454 5 190 810 190 190 445 555 5000 23

5.2 精度が他方より向上した回数 各乱数列を適用した際に識別 900 800 700 600 500 400 300 200 100 1 2 3 4 5 MT Quasi 決定木の最大深度 5.2 2 D max = 2 5.2 5.3 D max = 2 D max = 5 24

5.2 74 72 木の深さ :1 識別精度 ( % ) 70 68 66 64 62 60 1 2 3 4 5 6 7 8 9 10 特徴数 MT Quasi 5.3 D max = 1 木の深さ :1 識別精度 ( % ) 74 72 70 68 66 64 62 60 1 2 3 4 5 6 7 8 9 10 決定木の数 MT Quasi 5.4 D max = 1 25

5.2 5.2.2 D max = 1 5.3 D max = 1 5.4 N max 1 10 5.3 2.9%, F max = 8 4.6% 5.4 3.1%, 4.6% 1 1 1 SPAM SPAM 2 2 5.2.3 D max = 2 5.5 D max = 2 5.6 N max 1 10 D max = 1 5.5 F max = 9 1.8% F max = 3 2.0% 5.6 N max = 2 N max = 9 2.0% N max = 10 3.3% 26

5.2 木の深さ : 2 識別精度 ( % ) 72 71 70 69 68 67 66 65 64 63 62 1 2 3 4 5 6 7 8 9 10 特徴数 MT Quasi 5.5 D max = 2 識別精度 ( % ) 72 71 70 69 68 67 66 65 64 木の深さ :2 1 2 3 4 5 6 7 8 9 10 決定木の数 MT Quasi 5.6 D max = 2 27

5.2 N max = 10 5.2.4 D max = 3 D max = 4 76 木の深さ :3 識別精度 ( % ) 74 72 70 68 66 MT Quasi 64 1 2 3 4 5 6 7 8 9 10 特徴数 5.7 D max = 3 5.7 D max = 3 5.8 5.9 D max = 4 5.10 N max 1 10 5.7 0.7% 1.5% 5.8 0.7% 3.6% D max = 2 5.9 0.8% 1.2% 5.10 0.6% 1.0% D max = 3 1 28

5.2 木の深さ :3 識別精度 ( % ) 76 74 72 70 68 66 64 62 60 58 1 2 3 4 5 6 7 8 9 10 決定木の数 MT Quasi 5.8 D max = 3 D max = 4 D max = 4 29

5.2 78 76 木の深さ :4 識別精度 ( % ) 74 72 70 68 66 64 1 2 3 4 5 6 7 8 9 10 特徴数 MT Quasi 5.9 D max = 4 識別精度 ( % ) 78 76 74 72 70 68 66 64 62 木の深さ :4 1 2 3 4 5 6 7 8 9 10 決定木の数 MT Quasi 5.10 D max = 4 30

5.2 5.2.5 D max = 1 木の深さ :5 識別精度 ( % ) 78 76 74 72 70 68 66 64 62 60 1 2 3 4 5 6 7 8 9 10 特徴数 MT Quasi 5.11 D max = 5 識別精度 ( % ) 76 74 72 70 68 66 64 62 60 58 56 木の深さ :5 1 2 3 4 5 6 7 8 9 10 決定木の数 MT Quasi 5.12 D max = 5 31

5.2 5.11 D max = 5 5.12 N max 1 10 5.11 4.6% 5.7% 5.12 4.6% 8.2% D max = 5 D max = 1 5.11 F max = 10 5.12 5 1000 10 Random Forest 10 SPAM A 5.2.6 D max = 2 5.13 5.14 32

5.2 木の深さ :2, 決定木の数 11~20 識別精度 ( % ) 85.5 85 84.5 84 83.5 83 82.5 82 11 12 13 14 15 16 17 18 19 20 特徴数 MT Quasi 5.13 D max = 2 F max = 11 20 N max = 11 20 木の深さ :2, 特徴数 :11~20 識別精度 ( % ) 85.5 85 84.5 84 83.5 83 82.5 82 11 12 13 14 15 16 17 18 19 20 決定木の数 MT Quasi 5.14 D max = 2 F max = 11 20 N max = 11 20 33

5.2 5.2.7 木の深さ :2, 決定木の数 :11~20 86 識別精度 ( % ) 84 82 80 78 76 MT Quasi 74 1 2 3 4 5 6 7 8 9 10 特徴数 5.15 D max = 2 F max = 1 10 N max = 11 20 木の深さ :2, 特徴数 1~10 識別精度 ( % ) 88 87 86 85 84 83 82 81 80 11 12 13 14 15 16 17 18 19 20 決定木の数 MT Quasi 5.16 D max = 2 F max = 1 10 N max = 11 20 34

5.2 F max N max 1 10 11 20 2 5.15 5.18 2 35

5.2 88 木の深さ :2, 特徴数 :11~20 識別精度 ( % ) 86 84 82 80 78 MT Quasi 76 1 2 3 4 5 6 7 8 9 10 決定木の数 5.17 D max = 2 F max = 11 20 N max = 1 10 87 85 木の深さ :2, 決定木の数 :1~10 識別精度 ( % ) 83 81 79 77 75 73 11 12 13 14 15 16 17 18 19 20 特徴数 MT Quasi 5.18 D max = 2 F max = 11 20 N max = 1 10 36

5.3 5.3 25 rand_int 関数による特徴選択 20 15 10 5 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 5.19 rand int SOBOL 列による特徴選択 25 20 15 10 5 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 5.20 SOBOL 5.19 rand int 5.20 37

5.3 SOBOL 0 51 F max = 5, N max = 5, D max = 2 SOBOL SOBOL rand int rand int SOBOL Random Forest 38

6 Random Forest Random Forest D max 1 5 F max 1 10 N max 1 10 SPAM 600 SPAM 1000 SPAM 50 1550 1 10 1 D max = 2 F max = 1 9 N max = 2 10 1.9% 39

D max D max = 2 F max = 1 9 N max = 2 10 D max = 2 2 F max N max 1 20 D max = 2 UCI Spambase dataset UCI Machine Learning Repository Spambase dataset 4601 3% 3% 138 4463 D max = 2 1.9 % 3.2% D max = 2, F max = 1 10, N max = 1 10 SOBOL SOBOL Random Forest Random Forest 40

Random Forest 41

3 2 2 1 2 42

4 1 1 3 3 3 43

[1] Natsuki Fujimori and Shinichi Yoshida, Comparison of Machine Learning Algorithms for English and Japanese SPAM mail discrimination, The 3rd International Workshop on Advanced Computational Intelligence and Intelligent Informatics, IWACIII 2013, Shanghai, China, October 18-21, ss1-10, 2013. [2] Natsuki Fujimori and Shinichi Yoshida, Machine Learning Algorithms applied to SPAM Filtering for English and Japanese E-Mail, The 14th International Symposium on Advanced Intelligent Systems, ISIS2013, Daejeon, Korea, November 13-16, F3a-1, 2013. [3] 2013 [4] 2010 [5] R p266 2007 [6] Leo Breiman, Bagging Predictors, Machine Learning, 24, pp.123-140, 1996. [7] Leo Breiman, Random Forests, Machine Learning, 45, pp.5-23, 2001. [8] Makoto Matsumoto and Takuji Nishimura, Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS) 8.1 (1998), pp.3-30. [9] 4 p.1560 2007 [10] http://random.ism.ac.jp/info01/random number generation/node6.html. [11] IM Sobol, On the distribution of points in a cube and the approximate evaluation of integrals, USSR Computational mathematics and mathematical physics 7, pp.86-112, 1967. 44

[12] George Marsaglia, Xorshift RNGs, Journal of Statistical Software 8(14), pp.1-6, 2003. 45

A D max A.1 D max = 1 F max N max 1 63.5 63.3 1 64.0 64.7 2 65.6 63.7 2 65.4 64.8 3 66.1 64.1 3 67.9 65.4 4 67.2 64.3 4 67.9 64.7 5 67.2 64.8 5 67.9 65.1 6 68.6 65.1 6 68.7 65.2 7 68.7 65.5 7 68.9 65.7 8 70.3 65.7 8 68.7 65.1 9 70.0 66.5 9 69.0 64.3 10 66.5 71.2 10 69.9 64.9 46

A.2 D max = 2 F max N max 1 63.5 65.9 1 66.5 65.6 2 65.3 67.0 2 66.6 67.1 3 66.3 68.3 3 66.4 68.4 4 66.9 68.8 4 67.2 69.3 5 67.5 69.0 5 68.1 68.7 6 67.8 69.4 6 67.2 69.2 7 68.0 70.0 7 67.0 69.3 8 68.6 70.0 8 67.6 70.3 9 68.4 70.2 9 67.2 70.1 10 70.2 68.8 10 67.1 70.5 47

B 6 精度が他方より向上した回数 各乱数列を適用した際に識別 900 800 700 600 500 400 300 200 100 0 1 2 3 4 5 6 MT Quasi 決定木の最大深度 B.1 48

識別精度 ( % ) 84 83 82 81 80 79 78 77 76 75 木の深さ :6 1 2 3 4 5 6 7 8 9 10 特徴数 MT Quasi B.2 6 49

木の深さ :6 識別精度 ( % ) 84 82 80 78 76 74 72 70 1 2 3 4 5 6 7 8 9 10 決定木の数 MT Quasi B.3 6 50