21 Stock price forecast using text mining 1100323 2010 3 1
Q-Learning Support-Vector-Machine NIKKEI NET Infoseek MSN 10 1 12 22 170 121 10 9 15 12 22 85 2 85 10 i
Abstract Stock price forecast using text mining Koji Nakaya The stock prices forecast was done by the numeric character data.. However, Fund manager is having dealings over the stocks by using information in not only the numeric character data but also a qualitative news article. Then, text mining was used aiming to forecast stock price that used the quantitative data and qualitative data in this research. Qualitative data extracts the word with the possibility of influencing stock prices by using the article that relates to the economy of NIKKEI NET, Infoseek, and MSN Sankei news from October 1st to December 22nd by 170 articles and makes word vector of 121 words. The quantitative data uses the longitudinal data of Nikkei average closing share price from September 15th to December 22nd the tenth ago from the article announcement. Text mining by the Neural Network is used for the stock prices forecast. The learning technique of Neural Network uses Back Propagation. Verification data uses the longitudinal data of word vector and stock prices of 85 sets. Whether the teacher data is a rise of next day s stock prices is assumed to be binary of the descent. Data for the verification uses the longitudinal data of word vector and stock prices of the remainder of 85 sets. Only using together and word vector of word vector and stock prices compared the recognition rates of the longitudinal data of stock prices. The results show that using of word vector and stock prices together became the highest result. key words Neural Network, stock prices forecast, word vector ii
1 1 2 3 2.1............................. 4 2.2........................... 6 2.2.1....................... 6 2.3................................... 7 2.3.1............................. 8 2.3.2....................... 8 2.3.3.................... 9 3 14 3.1................................... 14 3.2................................... 14 3.3.................................... 15 4 18 4.1...................................... 18 4.2.................................... 19 21 23 A 24 iii
1.1................. 2 2.1................................... 3 2.2................................. 4 2.3.................................... 5 2.4................... 6 2.5.......................... 7 2.6................... 8 2.7.................................... 9 2.8.................................... 10 2.9................................. 11 2.10............................ 12 2.11.......................... 13 A.1 1................................ 24 A.2 2................................ 25 A.3 3................................ 26 A.4 4................................ 27 A.5 5................................ 28 A.6 6................................ 29 A.7 7................................ 30 A.8 8................................ 31 A.9 9................................ 32 A.10 10................................ 33 iv
A.11 11................................ 34 A.12 12................................ 35 A.13 13................................ 36 A.14 14................................ 37 A.15 15................................ 38 A.16 16................................ 39 A.17 17................................ 40 A.18 18................................ 41 v
3.1................................. 15 3.2............................ 16 3.3............................... 16 3.4.................................... 16 3.5............................ 17 3.6............................... 17 3.7.................................... 17 vi
1 90 [1] [2] ICT [2] 1.1 2 3 4 1
1.1 2
2 MeCab 2.1 2.1 2.2 10 2 3
2.1 2.2 2.3 2.1 [3] 4
2.1 2.3 [3] [4] 5
2.2 2.2 2 1 1 ( 2.4) 2.4 1 ( 2.5) 2.2.1 1 6
2.3 2.5 [5] 2.6 2.7 2.8 2.3 3 7
2.3 2.6 2.3.1 2.9 purelin(n) = n - 2.3.2 2.10 logsig(n) = - 1 (1 + exp( n)) 8
2.3 2.7 2.3.3 2.11 n = 2 (1 + exp( 2n)) 1-9
2.3 2.8 10
2.3 2.9 11
2.3 2.10 12
2.3 2.11 13
3 3.1 NIKKEI NET Infoseek MSN 10 1 12 22 170 121 9 15 12 22 3.2 NIKKEI NET Infoseek MSN 10 1 11 4 100 20 65 14
3.3 3.3 85 2 2 85 5 85 [1] 3.1 3.1 500 0.01 1e-10 : 1 : : 2 : 1 - - 3.2 3.3 3.4 15
3.3 3.2 1 2 3 4 5 65 44 75 76 49 61 50 55 52 52 55 52 3.3 1 2 3 4 5 60 50 54 50 58 54 50 47 51 45 58 50 3.4 1 2 3 4 5 50 45 55 41 51 48 55 47 48 57 62 53 - - 3.5 3.6 3.7 16
3.3 3.5 1 2 3 4 5 70 76 63 63 51 64 54 41 55 54 58 52 3.6 1 2 3 4 5 55 54 49 52 54 52 54 50 47 51 56 51 3.7 1 2 3 4 5 45 43 71 64 48 54 60 62 49 45 67 56 17
4 4.1 18
4.2 0 4.2 5 500 0.01 2 1e-10 - - 3 2 2 19
4.2 20
2 1 21
2 1 3 3 1 22
[1] : [2] UFJ : 23 2009 [3] George Chang Marcus J. Healey James A. M. Mchugh Jason T. L. Wang Web 2005 [4] / 2006 [5], 2007 23
A A.1 1 24
A.2 2 25
A.3 3 26
A.4 4 27
A.5 5 28
A.6 6 29
A.7 7 30
A.8 8 31
A.9 9 32
A.10 10 33
A.11 11 34
A.12 12 35
A.13 13 36
A.14 14 37
A.15 15 38
A.16 16 39
A.17 17 40
A.18 18 41