28 9

Similar documents
R による統計解析入門

Use R

講義のーと : データ解析のための統計モデリング. 第3回

講義のーと : データ解析のための統計モデリング. 第2回

「産業上利用することができる発明」の審査の運用指針(案)

X X X Y R Y R Y R MCAR MAR MNAR Figure 1: MCAR, MAR, MNAR Y R X 1.2 Missing At Random (MAR) MAR MCAR MCAR Y X X Y MCAR 2 1 R X Y Table 1 3 IQ MCAR Y I

Rによる計量分析:データ解析と可視化 - 第3回 Rの基礎とデータ操作・管理

Rによる計量分析:データ解析と可視化 - 第2回 セットアップ

講義のーと : データ解析のための統計モデリング. 第5回

1 Stata SEM LightStone 4 SEM 4.. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press 3.

一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

DAA09

k3 ( :07 ) 2 (A) k = 1 (B) k = 7 y x x 1 (k2)?? x y (A) GLM (k

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or

1 15 R Part : website:

PackageSoft/R-033U.tex (2018/March) R:

1 R Windows R 1.1 R The R project web R web Download [CRAN] CRAN Mirrors Japan Download and Install R [Windows 9

1 Stata SEM LightStone 3 2 SEM. 2., 2,. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press.

plot type type= n text plot type= n text(x,y) iris 5 iris iris.label >iris.label<-rep(c(,, ),rep(50,3)) 2 13 >plot(iris[,1],iris

!!! 2!

Microsoft Word - 計量研修テキスト_第5版).doc

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

I II III 28 29

生活設計レジメ

44 4 I (1) ( ) (10 15 ) ( 17 ) ( 3 1 ) (2)


R John Fox R R R Console library(rcmdr) Rcmdr R GUI Windows R R SDI *1 R Console R 1 2 Windows XP Windows * 2 R R Console R ˆ R

takano1

こんにちは由美子です

2.1 R, ( ), Download R for Windows base. R ( ) R win.exe, 2.,.,.,. R > 3*5 # [1] 15 > c(19,76)+c(11,13)

97-00

Stata11 whitepapers mwp-037 regress - regress regress. regress mpg weight foreign Source SS df MS Number of obs = 74 F(

% 10%, 35%( 1029 ) p (a) 1 p 95% (b) 1 Std. Err. (c) p 40% 5% (d) p 1: STATA (1). prtesti One-sample test of pr

第2回:データの加工・整理

はじめての帳票作成

1 I EViews View Proc Freeze

統計研修R分散分析(追加).indd

untitled

2004/01/ /01/23 2 I /04/ /04/ ,-1-8-1,-2-2-1,-3-4-1,-3-5-1,-4-2-1, ,-5-6-1, _.doc 1

¥¤¥ó¥¿¡¼¥Í¥Ã¥È·×¬¤È¥Ç¡¼¥¿²òÀÏ Âè2²ó

第11回:線形回帰モデルのOLS推定

フリーソフトではじめる機械学習入門 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

分布

i


Wide Scanner TWAIN Source ユーザーズガイド

Stata 11 Stata ROC whitepaper mwp anova/oneway 3 mwp-042 kwallis Kruskal Wallis 28 mwp-045 ranksum/median / 31 mwp-047 roctab/roccomp ROC 34 mwp-050 s

: (EQS) /EQUATIONS V1 = 30*V F1 + E1; V2 = 25*V *F1 + E2; V3 = 16*V *F1 + E3; V4 = 10*V F2 + E4; V5 = 19*V99

SAS Enterprise Miner PFD SAS Rapid Predictive Modeler & SAS SEMMA 5 SEMMA SAS Rapid Predictive Modeler SAS Rapid Predictive Modeler SAS Enterprise Gui

こんにちは由美子です

untitled

福祉行財政と福祉計画[第3版]

1 1.1 PC PC PC PC PC workstation PC hardsoft PC PC CPU 1 Gustavb, Wikimedia Commons.

1 1 ( ) ( % mm % A B A B A 1

橡ミュラー列伝Ⅰ.PDF

kubostat2017b p.1 agenda I 2017 (b) probability distribution and maximum likelihood estimation :

- 2 -


II III I ~ 2 ~

中堅中小企業向け秘密保持マニュアル

1 (1) (2)

Transcription:

28 9

D3()Vol.68No.5pp.773-780 (2012) HASEGAWA Hironobu, FUJII Masaru, ARIMURA Mikiharu, TAMURA Tohru: A Basic Study on Traffic Accident Data Analysis Using Support Vector MachineJournal of the Eastern Asia Society of Transportation Studies, Vol.7, pp.2873-2880 (2007) Vol.50pp.219-228 (2007) 8

I 1 1. 1 2. 3 II 4 3. 5 (1).................................... 5 (2) CSV.......................................... 5 4. Excel 9 III 15 5. 16 (1)............................................... 16 (2)............................................... 18 a)......................................... 19 b)....................................... 20 (3)............................................... 20 6. 22 (1)......................................... 22 (2)......................................... 24 (3) k-means.................................... 27 (4)........................................... 32 7. 35 (1)........................................ 35 (2) k-means........................................ 43 8. 46 (1) 0 1............................ 46 (2) 0 1......................... 49 9. 52 (1)............................................. 52 (2)................................................ 55 a)............................. 56 b) subset()................................... 58 IV 60 V 61 3

10. 61 11. 64 (1)........................................ 64 (2)........................................ 65 12. R 68 (1) RscriptAndData.zip....................................... 68 (2) R......................................... 68 (3)......................................... 71 (4)............................................. 71 (5).......................................... 71

1............................... 1 2 7............................................ 23 3 EWD 7............................. 25 4 EFD 7............................. 26 5....................................... 28 6 k-means 7........................... 30 7 7....................................... 31 8....................................... 32 9........................................ 37 10........................................ 42 11 1 6.............................. 44 12 7 12............................. 45 13 /........................................ 67 14 R.......................... 68 15 Windows R R.............................. 69 16 Windows Rstudio R.............................. 70 1 sagamihara............................................ 8 2 22...... 10 3................................ 55 4 H22.................................. 55

I 1. IC GPS 1 1: 1) p.8 1.2 2),3),4),5),6),7) 8),9),10),11) R 1

Web 2

2. R II Excel R III IV V R R R PDF 1p. 1 *1 III5.(2)a) 2) R R 2p. 5 2) R 1 getwd ( ) # 2 q ( ) # R 3 # 4 i f ( 0 ) { 5 6 i f ( 0 ) { } 7 } R 1: R R R R $$ # data <- iris # R iris # 3 head(data, 3) $$ Sepal.Length Sepal.Width Petal.Length Petal.Width Species $$ 1 5.1 3.5 1.4 0.2 setosa $$ 2 4.9 3.0 1.4 0.2 setosa $$ 3 4.7 3.2 1.3 0.2 setosa 1 3

II 4 (a) (b) (c) (d) R R TB R RODBC *2 R R *3 Web RjpWiki R-Tips seekr 2 1GB 3 12) 13) 4

3. Excel IC R R (1) read.fwf() Windows Mac Linux fileencoding cp932 *4 width help(read.fwf) R 2 3 1, 3, 5 *5 input.txt R R 2: read.fwf() 1 read. fwf ( f i l e= input. t x t, f i l e E n c o d i n g= cp932, width=c ( 1, 3, 5 ) ) R R *6 2),5) 108.4MB *7 read.fwf() 3 readr read fwf() 2 read fwf() (a) (b) factorcharacter (c) *8 *9 readr 2015 8 20 CRAN R 3 read fwf() R 3: read fwf() 1 i n s t a l l. packages ( r e a d r ) # readr 2 library ( r e a d r ) 3 read fwf ( f i l e= input. t x t, fwf widths ( c ( 1, 3, 5 ) ) ) (2) CSV CSV comma-separated values, 4 Shift-JIS CP932 http://qiita.com/kasei-san/items/cfb993786153231e5413 5 9 = 1 + 4 + 5 6 read.table() read.csv() 7 18 4 1 162 233177 108.4MB 8 Mac Linux UTF-8 9 nkf VIMLinuxMacWindows Notepad++ WindowsCoteditorMac 5

R CSV read.csv() readr read csv() jyoukou 20141111.csv 1p. 8 read.csv() *10 # sagamihara <- read.csv(file = "jyoukou_20141111.csv", header = TRUE, fileencoding = "UTF-8") # colnames(sagamihara) <- c("", "", "1975 ", "1980 ", "1985 ", "1989 ", "1993 ", "1998 ", "2003 ", "2008 ", "2009 ", "2010 ", "2011 ", "2012 ", "2013 ") # sagamihara $$ 1975 1980 1985 1989 1993 1998 $$ 1 31954 35144 44758 57466 88988 94680 $$ 2 29144 31732 39784 48172 52430 52812 $$ 3 6706 9516 11562 16974 20532 20384 $$ 4 27228 32378 39840 52544 59198 58794 $$ 5 NA NA NA 14802 30840 37590 $$ 6 6488 7812 11278 16058 20530 22174 $$ 7 54506 79366 119620 171478 195652 197638 $$ 8 2536 2006 2316 4244 9536 10176 $$ 9 4526 4562 3292 4210 8686 9910 $$ 10 1218 2334 1928 2766 4558 5490 $$ 11 1874 2930 3126 4164 5834 6616 $$ 12 836 856 1082 1184 1572 1652 $$ 13 700 794 808 1114 1532 1798 $$ 14 NA NA NA NA NA NA $$ 15 NA NA NA NA NA NA $$ 16 NA NA NA NA NA NA $$ 17 NA NA NA NA NA NA $$ 18 83267 84879 87864 95594 106006 109717 $$ 19 48952 50881 54001 61564 63542 57429 $$ 20 19818 20921 22554 25316 27219 24346 $$ 21 154831 197393 236259 281813 297703 279498 $$ 22 32405 36235 39880 44960 47728 44228 $$ 23 NA NA NA 2944 9190 12845 $$ 24 NA NA NA 33755 55492 67917 $$ 25 NA NA NA NA 2947 4464 $$ 2003 2008 2009 2010 2011 2012 2013 $$ 1 104522 118162 118098 120244 120482 122254 125510 $$ 2 53448 56370 55774 56158 55716 56566 57552 $$ 3 21220 22648 22348 22356 22646 22960 23594 $$ 4 72368 77154 78518 79600 78700 80870 74276 10 4.0 6

$$ 5 39480 42410 42404 42796 42626 43370 44614 $$ 6 21334 20330 20194 20422 20198 20392 20842 $$ 7 211864 216428 215598 218154 218084 221086 221880 $$ 8 9836 10886 10622 10622 10378 10426 10820 $$ 9 9798 11456 11206 11060 10702 11204 11600 $$ 10 5484 5980 6076 6388 6384 6630 7076 $$ 11 6890 8282 8266 8328 8266 8834 9030 $$ 12 1694 2002 1994 1968 2016 2128 2252 $$ 13 1856 2510 2404 2184 2070 2282 2406 $$ 14 6744 6914 6904 6746 6586 5118 5144 $$ 15 5898 5892 5742 5550 5382 5310 5352 $$ 16 65108 63334 61724 61034 59936 59766 60568 $$ 17 162546 164788 160546 160438 162948 165042 170382 $$ 18 108602 121338 119240 119166 120113 122453 128006 $$ 19 55944 55754 55392 55034 54366 55530 56767 $$ 20 22883 22176 21796 21422 21152 21420 21584 $$ 21 282772 291952 289622 290621 288884 291678 292779 $$ 22 41987 39977 39301 39160 37931 38110 38869 $$ 23 16037 19994 20539 21228 21096 21330 21719 $$ 24 78072 88320 88427 88065 87242 88377 91060 $$ 25 10727 16526 16678 17183 17184 17582 18471 sagamihara sagamihara R R V 10.p. 61 1 sagamihara 1975 1985 *11 NA 5.p. 16 11 1988 7

1: sagamihara 1975 1980 1985 1989 1993 1998 2003 2008 2009 2010 2011 2012 2013 1 31954 35144 44758 57466 88988 94680 104522 118162 118098 120244 120482 122254 125510 2 29144 31732 39784 48172 52430 52812 53448 56370 55774 56158 55716 56566 57552 3 6706 9516 11562 16974 20532 20384 21220 22648 22348 22356 22646 22960 23594 4 27228 32378 39840 52544 59198 58794 72368 77154 78518 79600 78700 80870 74276 5 14802 30840 37590 39480 42410 42404 42796 42626 43370 44614 6 6488 7812 11278 16058 20530 22174 21334 20330 20194 20422 20198 20392 20842 7 54506 79366 119620 171478 195652 197638 211864 216428 215598 218154 218084 221086 221880 8 2536 2006 2316 4244 9536 10176 9836 10886 10622 10622 10378 10426 10820 9 4526 4562 3292 4210 8686 9910 9798 11456 11206 11060 10702 11204 11600 10 1218 2334 1928 2766 4558 5490 5484 5980 6076 6388 6384 6630 7076 11 1874 2930 3126 4164 5834 6616 6890 8282 8266 8328 8266 8834 9030 12 836 856 1082 1184 1572 1652 1694 2002 1994 1968 2016 2128 2252 13 700 794 808 1114 1532 1798 1856 2510 2404 2184 2070 2282 2406 14 6744 6914 6904 6746 6586 5118 5144 15 5898 5892 5742 5550 5382 5310 5352 16 65108 63334 61724 61034 59936 59766 60568 17 162546 164788 160546 160438 162948 165042 170382 18 83267 84879 87864 95594 106006 109717 108602 121338 119240 119166 120113 122453 128006 19 48952 50881 54001 61564 63542 57429 55944 55754 55392 55034 54366 55530 56767 20 19818 20921 22554 25316 27219 24346 22883 22176 21796 21422 21152 21420 21584 21 154831 197393 236259 281813 297703 279498 282772 291952 289622 290621 288884 291678 292779 22 32405 36235 39880 44960 47728 44228 41987 39977 39301 39160 37931 38110 38869 23 2944 9190 12845 16037 19994 20539 21228 21096 21330 21719 24 33755 55492 67917 78072 88320 88427 88065 87242 88377 91060 25 2947 4464 10727 16526 16678 17183 17184 17582 18471 8

4. Excel Excel R xlsx read.xlsx() XLConnect readworksheetfromfile() 22 zkntrf05.xls 2 9

2: 22 10

readworksheetfromfile() # #install.packages("xlconnect") library(xlconnect) # # AkitaPT <- readworksheetfromfile(file="zkntrf05.xls", # sheet = 1, # header = TRUE, # startcol = 1, # startrow = 7, # endcol = 33 # ) # colnames(akitapt) $$ [1] "Col1" "Col2" "Col3" "Col4" "Col5" "Col6" "Col7" $$ [8] "Col8" "Col9" "Col10" "Col11" "Col12" "Col13" "Col14" $$ [15] "Col15" "Col16" "Col17" "Col18" "Col19" "Col20" "Col21" $$ [22] "Col22" "Col23" "Col24" "Col25" "Col26" "Col27" "Col28" $$ [29] "Col29" "Col30" "Col31" "X.." "X...1" # colnames(akitapt) <- c("", "", "", "1224 ", "", "", "", "7 ", "8 ", "9 ", "10 ", "11 ", "12 ", "13 ", "14 ", "15 ", "16 ", "17 ", "18 ", "19 ", "20 ", "21 ", "22 ", "23 ", "0 ", "1 ", "2 ", "3 ", "4 ", "5 ", "6 ", " 12 ", "24 ") str() # str(akitapt) $$ 'data.frame': 2272 obs. of 33 variables: $$ $ : num 10 10 10 10 20 20 20 20 30 30... $$ $ : num 1 1 1 1 1 1 1 1 1 1... $$ $ : num 1040 1040 1040 1040 1040 1040 1040 1040 1040 1040... $$ $ 1224 : num 2 2 2 2 2 2 2 2 2 2... $$ $ : num 1 1 1 1 2 2 2 2 2 2... $$ $ : num 1 1 2 2 1 1 2 2 1 1... $$ $ : num 1 2 1 2 1 2 1 2 1 2... $$ $ 7 : num 89 43 102 118 88 49 79 112 73 39... 11

$$ $ 8 : num 135 60 160 99 119 57 132 106 96 39... $$ $ 9 : num 136 63 155 75 134 61 154 79 111 52... $$ $ 10 : num 138 70 200 56 121 64 172 50 110 61... $$ $ 11 : num 135 76 185 76 138 81 157 59 114 66... $$ $ 12 : num 147 83 133 72 108 78 137 59 93 68... $$ $ 13 : num 130 81 139 60 125 69 123 70 119 58... $$ $ 14 : num 165 68 148 45 162 75 127 43 142 59... $$ $ 15 : num 198 67 142 38 167 68 147 40 151 57... $$ $ 16 : num 206 75 150 50 208 65 132 40 183 40... $$ $ 17 : num 196 55 162 47 168 50 136 45 135 43... $$ $ 18 : num 146 54 128 49 134 48 135 41 130 39... $$ $ 19 : num 111 64 104 53 94 66 95 38 78 51... $$ $ 20 : num 47 81 70 50 41 79 73 50 42 68... $$ $ 21 : num 25 52 40 28 29 36 35 30 32 26... $$ $ 22 : num 31 56 45 70 30 68 37 54 27 71... $$ $ 23 : num 22 71 22 74 14 57 24 81 15 52... $$ $ 0 : num 12 35 15 82 12 29 18 73 13 25... $$ $ 1 : num 9 34 5 86 7 46 9 79 6 39... $$ $ 2 : num 5 32 8 76 8 21 8 74 7 20... $$ $ 3 : num 6 28 7 82 5 29 7 85 5 26... $$ $ 4 : num 13 22 17 106 8 25 13 88 6 21... $$ $ 5 : num 22 36 22 137 22 36 19 134 24 33... $$ $ 6 : num 65 45 56 114 57 35 51 120 51 28... $$ $ 12 : num 1821 795 1804 785 1672... $$ $ 24 : num 2189 1351 2215 1743 1999... numnumeric7 24 as.factor() as.ordered() # # 1 7 7 for (i in 1:7) { AkitaPT[, i] <- as.factor(akitapt[, i]) # } # str(akitapt) $$ 'data.frame': 2272 obs. of 33 variables: $$ $ : Factor w/ 568 levels "10","20","30",..: 1 1 1 1 2 2 2 2 3 3... $$ $ : Factor w/ 4 levels "1","3","4","6": 1 1 1 1 1 1 1 1 1 1... $$ $ : Factor w/ 200 levels "2","3","4","7",..: 197 197 197 197 197 197 197 197 $$ $ 1224 : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2... $$ $ : Factor w/ 4 levels "1","2","3","6": 1 1 1 1 2 2 2 2 2 2... $$ $ : Factor w/ 2 levels "1","2": 1 1 2 2 1 1 2 2 1 1... $$ $ : Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2... $$ $ 7 : num 89 43 102 118 88 49 79 112 73 39... $$ $ 8 : num 135 60 160 99 119 57 132 106 96 39... 12

$$ $ 9 : num 136 63 155 75 134 61 154 79 111 52... $$ $ 10 : num 138 70 200 56 121 64 172 50 110 61... $$ $ 11 : num 135 76 185 76 138 81 157 59 114 66... $$ $ 12 : num 147 83 133 72 108 78 137 59 93 68... $$ $ 13 : num 130 81 139 60 125 69 123 70 119 58... $$ $ 14 : num 165 68 148 45 162 75 127 43 142 59... $$ $ 15 : num 198 67 142 38 167 68 147 40 151 57... $$ $ 16 : num 206 75 150 50 208 65 132 40 183 40... $$ $ 17 : num 196 55 162 47 168 50 136 45 135 43... $$ $ 18 : num 146 54 128 49 134 48 135 41 130 39... $$ $ 19 : num 111 64 104 53 94 66 95 38 78 51... $$ $ 20 : num 47 81 70 50 41 79 73 50 42 68... $$ $ 21 : num 25 52 40 28 29 36 35 30 32 26... $$ $ 22 : num 31 56 45 70 30 68 37 54 27 71... $$ $ 23 : num 22 71 22 74 14 57 24 81 15 52... $$ $ 0 : num 12 35 15 82 12 29 18 73 13 25... $$ $ 1 : num 9 34 5 86 7 46 9 79 6 39... $$ $ 2 : num 5 32 8 76 8 21 8 74 7 20... $$ $ 3 : num 6 28 7 82 5 29 7 85 5 26... $$ $ 4 : num 13 22 17 106 8 25 13 88 6 21... $$ $ 5 : num 22 36 22 137 22 36 19 134 24 33... $$ $ 6 : num 65 45 56 114 57 35 51 120 51 28... $$ $ 12 : num 1821 795 1804 785 1672... $$ $ 24 : num 2189 1351 2215 1743 1999... AkitaPT AkitaPT sagamihara AkitaPT 2000 10 # AkitaPT nrow(akitapt) $$ [1] 2272 # 10 head(akitapt, 10) $$ 1224 $$ 1 10 1 1040 2 1 $$ 2 10 1 1040 2 1 $$ 3 10 1 1040 2 1 $$ 4 10 1 1040 2 1 $$ 5 20 1 1040 2 2 $$ 6 20 1 1040 2 2 $$ 7 20 1 1040 2 2 $$ 8 20 1 1040 2 2 $$ 9 30 1 1040 2 2 $$ 10 30 1 1040 2 2 $$ 7 8 9 10 11 12 13 $$ 1 1 1 89 135 136 138 135 147 130 13

$$ 2 1 2 43 60 63 70 76 83 81 $$ 3 2 1 102 160 155 200 185 133 139 $$ 4 2 2 118 99 75 56 76 72 60 $$ 5 1 1 88 119 134 121 138 108 125 $$ 6 1 2 49 57 61 64 81 78 69 $$ 7 2 1 79 132 154 172 157 137 123 $$ 8 2 2 112 106 79 50 59 59 70 $$ 9 1 1 73 96 111 110 114 93 119 $$ 10 1 2 39 39 52 61 66 68 58 $$ 14 15 16 17 18 19 20 21 22 23 $$ 1 165 198 206 196 146 111 47 25 31 22 $$ 2 68 67 75 55 54 64 81 52 56 71 $$ 3 148 142 150 162 128 104 70 40 45 22 $$ 4 45 38 50 47 49 53 50 28 70 74 $$ 5 162 167 208 168 134 94 41 29 30 14 $$ 6 75 68 65 50 48 66 79 36 68 57 $$ 7 127 147 132 136 135 95 73 35 37 24 $$ 8 43 40 40 45 41 38 50 30 54 81 $$ 9 142 151 183 135 130 78 42 32 27 15 $$ 10 59 57 40 43 39 51 68 26 71 52 $$ 0 1 2 3 4 5 6 12 $$ 1 12 9 5 6 13 22 65 1821 $$ 2 35 34 32 28 22 36 45 795 $$ 3 15 5 8 7 17 22 56 1804 $$ 4 82 86 76 82 106 137 114 785 $$ 5 12 7 8 5 8 22 57 1672 $$ 6 29 46 21 29 25 36 35 765 $$ 7 18 9 8 7 13 19 51 1631 $$ 8 73 79 74 85 88 134 120 744 $$ 9 13 6 7 5 6 24 51 1457 $$ 10 25 39 20 26 21 33 28 621 $$ 24 $$ 1 2189 $$ 2 1351 $$ 3 2215 $$ 4 1743 $$ 5 1999 $$ 6 1292 $$ 7 2020 $$ 8 1650 $$ 9 1763 $$ 10 1081 14

III data preprocessingdata cleaning, data cleansing R 15

5. missing value (1) complete.cases() complete.cases() FALSETRUE sagamihara # complete.cases(sagamihara) $$ [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE $$ [12] TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE $$ [23] FALSE FALSE FALSE sagamihara FALSE 5 14 17 23 25 is.na() # is.na(sagamihara) $$ 1975 1980 1985 1989 1993 1998 2003 2008 $$ [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [5,] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE $$ [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [13,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 16

$$ [14,] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE $$ [15,] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE $$ [16,] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE $$ [17,] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE $$ [18,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [19,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [20,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [21,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [22,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [23,] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE $$ [24,] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE $$ [25,] FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE $$ 2009 2010 2011 2012 2013 $$ [1,] FALSE FALSE FALSE FALSE FALSE $$ [2,] FALSE FALSE FALSE FALSE FALSE $$ [3,] FALSE FALSE FALSE FALSE FALSE $$ [4,] FALSE FALSE FALSE FALSE FALSE $$ [5,] FALSE FALSE FALSE FALSE FALSE $$ [6,] FALSE FALSE FALSE FALSE FALSE $$ [7,] FALSE FALSE FALSE FALSE FALSE $$ [8,] FALSE FALSE FALSE FALSE FALSE $$ [9,] FALSE FALSE FALSE FALSE FALSE $$ [10,] FALSE FALSE FALSE FALSE FALSE $$ [11,] FALSE FALSE FALSE FALSE FALSE $$ [12,] FALSE FALSE FALSE FALSE FALSE $$ [13,] FALSE FALSE FALSE FALSE FALSE $$ [14,] FALSE FALSE FALSE FALSE FALSE $$ [15,] FALSE FALSE FALSE FALSE FALSE $$ [16,] FALSE FALSE FALSE FALSE FALSE $$ [17,] FALSE FALSE FALSE FALSE FALSE $$ [18,] FALSE FALSE FALSE FALSE FALSE $$ [19,] FALSE FALSE FALSE FALSE FALSE $$ [20,] FALSE FALSE FALSE FALSE FALSE $$ [21,] FALSE FALSE FALSE FALSE FALSE $$ [22,] FALSE FALSE FALSE FALSE FALSE $$ [23,] FALSE FALSE FALSE FALSE FALSE $$ [24,] FALSE FALSE FALSE FALSE FALSE $$ [25,] FALSE FALSE FALSE FALSE FALSE mice md.pattern() # install.packages('mice') # library(mice) md.pattern(sagamihara) $$ 2003 2008 2009 2010 2011 2012 2013 1993 $$ 17 1 1 1 1 1 1 1 1 1 1 $$ 3 1 1 1 1 1 1 1 1 1 1 17

$$ 1 1 1 1 1 1 1 1 1 1 1 $$ 4 1 1 1 1 1 1 1 1 1 0 $$ 0 0 0 0 0 0 0 0 0 4 $$ 1998 1989 1975 1980 1985 $$ 17 1 1 1 1 1 0 $$ 3 1 1 0 0 0 3 $$ 1 1 0 0 0 0 4 $$ 4 0 0 0 0 0 6 $$ 4 5 8 8 8 37 1 0 1 2 0 17 3 1975 1980 1985 3 3 4 1989 1975 1980 1985 4 1 5 1993 1998 1989 1975 1980 1985 6 4 6 1993 4 1998 4 1989 5 1975 8 1980 8 1985 8 sagamihara 37 (2) 1) 1) p.28 ab 2 a 0.6 b 0.4 15) 15) p.45 (a) 1 (b) (c) 1 1 (a) 18

(b) (c) (d) stochastic regression imputation, SRI (e) full information maximum likelihood, FIML (d) multiple imputation, MI 5. (3) a) sagamihara sagamihara R mice mice() method "norm.nob" 2003 1998 library(mice) # imp <- mice(sagamihara[, c(9, 8)], method = "norm.nob", m = 1, maxit = 100, printflag = FALSE) # '1998.sri''1998 ' sagamihara$"1998.sri" <- sagamihara$"1998 " ## '1998.sri' sagamihara$"1998.sri"[is.na(sagamihara$"1998 ")] <- unlist(imp$imp$"1998 ") # 1998 2003 subset(sagamihara, select = c("1998 ", "1998.sri", "2003 ")) $$ 1998 1998.sri 2003 $$ 1 94680 94680.000 104522 $$ 2 52812 52812.000 53448 $$ 3 20384 20384.000 21220 $$ 4 58794 58794.000 72368 $$ 5 37590 37590.000 39480 $$ 6 22174 22174.000 21334 $$ 7 197638 197638.000 211864 $$ 8 10176 10176.000 9836 $$ 9 9910 9910.000 9798 $$ 10 5490 5490.000 5484 $$ 11 6616 6616.000 6890 $$ 12 1652 1652.000 1694 $$ 13 1798 1798.000 1856 $$ 14 NA -1454.228 6744 $$ 15 NA 2127.083 5898 $$ 16 NA 60985.515 65108 19

$$ 17 NA 159289.434 162546 $$ 18 109717 109717.000 108602 $$ 19 57429 57429.000 55944 $$ 20 24346 24346.000 22883 $$ 21 279498 279498.000 282772 $$ 22 44228 44228.000 41987 $$ 23 12845 12845.000 16037 $$ 24 67917 67917.000 78072 $$ 25 4464 4464.000 10727 b) FIML) lavaan lavaan cfa() (Confirmatory Factor Analysis, CFA) growth() *12 Growth Curve model lavaan() latent variable model lavcor() polychoric correlation coefficientpolyserial correlation coefficient *13 Pearson product-moment correlation coefficient sem() Structural Equation Modeling, SEM missing "fiml"fiml "listwise" "pairwise" 16) (3) na.omit() # na.omit(sagamihara) $$ 1975 1980 1985 1989 1993 1998 $$ 1 31954 35144 44758 57466 88988 94680 $$ 2 29144 31732 39784 48172 52430 52812 $$ 3 6706 9516 11562 16974 20532 20384 $$ 4 27228 32378 39840 52544 59198 58794 $$ 6 6488 7812 11278 16058 20530 22174 $$ 7 54506 79366 119620 171478 195652 197638 $$ 8 2536 2006 2316 4244 9536 10176 $$ 9 4526 4562 3292 4210 8686 9910 $$ 10 1218 2334 1928 2766 4558 5490 $$ 11 1874 2930 3126 4164 5834 6616 $$ 12 836 856 1082 1184 1572 1652 $$ 13 700 794 808 1114 1532 1798 $$ 18 83267 84879 87864 95594 106006 109717 $$ 19 48952 50881 54001 61564 63542 57429 $$ 20 19818 20921 22554 25316 27219 24346 $$ 21 154831 197393 236259 281813 297703 279498 12 13 20

$$ 22 32405 36235 39880 44960 47728 44228 $$ 2003 2008 2009 2010 2011 2012 2013 1998.sri $$ 1 104522 118162 118098 120244 120482 122254 125510 94680 $$ 2 53448 56370 55774 56158 55716 56566 57552 52812 $$ 3 21220 22648 22348 22356 22646 22960 23594 20384 $$ 4 72368 77154 78518 79600 78700 80870 74276 58794 $$ 6 21334 20330 20194 20422 20198 20392 20842 22174 $$ 7 211864 216428 215598 218154 218084 221086 221880 197638 $$ 8 9836 10886 10622 10622 10378 10426 10820 10176 $$ 9 9798 11456 11206 11060 10702 11204 11600 9910 $$ 10 5484 5980 6076 6388 6384 6630 7076 5490 $$ 11 6890 8282 8266 8328 8266 8834 9030 6616 $$ 12 1694 2002 1994 1968 2016 2128 2252 1652 $$ 13 1856 2510 2404 2184 2070 2282 2406 1798 $$ 18 108602 121338 119240 119166 120113 122453 128006 109717 $$ 19 55944 55754 55392 55034 54366 55530 56767 57429 $$ 20 22883 22176 21796 21422 21152 21420 21584 24346 $$ 21 282772 291952 289622 290621 288884 291678 292779 279498 $$ 22 41987 39977 39301 39160 37931 38110 38869 44228 # sagamihara2 sagamihara2 <- na.omit(sagamihara) 21

6. AkitaPT 2 AkitaPT 7 AkitaPT 7 / 7 0/1000/ # summary(akitapt$"7 ") $$ Min. 1st Qu. Median Mean 3rd Qu. Max. $$ 0.0 10.0 41.0 136.1 155.2 1694.0 # plot(akitapt$"7 ") 3 (a) EWD, Equal Width Discretization (b) EFD, Equal Frequency Discretization (c) k-means k (1) R infotheo discretize() disc "equalwidth" nbins 1/3 2271 1/3 = 13.14628 13 n = 1 + log 2 NnN 12 12 ( 1) = (1) EWD #install.packages("infotheo") library(infotheo) # length(akitapt$"7 ") $$ [1] 2272 # ewd.akitapt7 <- discretize(akitapt$"7 ", disc = "equalwidth" 22

2: 7 23

#, nbins = trunc(length(akitapt "7 ")^(1/3)) #, nbins = trunc(1 + log2(length(akitapt$"7 "))) # ) # (max(akitapt$"7 ") - min(akitapt$"7 "))/length(akitapt$"7 ") $$ [1] 0.7455986 3 2 AkitaPT 7 2, 3 # plot(c(t(ewd.akitapt7))) # table(ewd.akitapt7) $$ ewd.akitapt7 $$ 1 2 3 4 5 6 7 8 9 10 11 12 $$ 1664 241 148 85 60 30 15 13 4 6 2 4 (2) R infotheo discretize() disc "equalfreq" EWD nbins 1/3 2271 1/3 = 13.14628 13 EFD #install.packages("infotheo") library(infotheo) # efd.akitapt7 <- discretize(akitapt$"7 ", disc = "equalfreq" #, nbins = trunc(length(akitapt "7 ")^(1/3)) #, nbins = trunc(1 + log2(length(akitapt$"7 "))) # ) 4p. 26 2 AkitaPT 7 24

3: EWD 7 25

4: EFD 7 2 p. 23, 4 # plot(c(t(efd.akitapt7))) EWD # table(efd.akitapt7) $$ efd.akitapt7 26

$$ 1 2 3 4 5 6 7 8 9 10 11 12 $$ 219 183 189 178 184 188 183 191 189 186 189 193 EWD EFD # ewdefd.akitapt7 <- data.frame(ewd = ewd.akitapt7, EFD = efd.akitapt7) # colnames(ewdefd.akitapt7) <- c("ewd", "EFD") # table(ewdefd.akitapt7) $$ EFD $$ EWD 1 2 3 4 5 6 7 8 9 10 11 12 $$ 1 219 183 189 178 184 188 183 191 149 0 0 0 $$ 2 0 0 0 0 0 0 0 0 40 186 15 0 $$ 3 0 0 0 0 0 0 0 0 0 0 148 0 $$ 4 0 0 0 0 0 0 0 0 0 0 26 59 $$ 5 0 0 0 0 0 0 0 0 0 0 0 60 $$ 6 0 0 0 0 0 0 0 0 0 0 0 30 $$ 7 0 0 0 0 0 0 0 0 0 0 0 15 $$ 8 0 0 0 0 0 0 0 0 0 0 0 13 $$ 9 0 0 0 0 0 0 0 0 0 0 0 4 $$ 10 0 0 0 0 0 0 0 0 0 0 0 6 $$ 11 0 0 0 0 0 0 0 0 0 0 0 2 $$ 12 0 0 0 0 0 0 0 0 0 0 0 4 (3) k-means k-means *14 k-means k k 2 2 k-means (a) k (b) (c) 2. 5 x y n = 100 k-means k-means 3 k-means p n (x 1,..., x n ) p R p k-means k 14 k-c-means c- 27

y 0.5 0.0 0.5 1.0 1.5 y 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 x 0.5 0.0 0.5 1.0 1.5 x 5: 28

G i (i = 0,..., k) k n U x k G i i l u il 1 x l / G j u jl = 0 G i v i V = (v 0,..., v i ) G i N i *15 N i = n l=1 u il x y d(x, y) G i v i x l D il = v i x l 2 (2) k-means U 3 c n J(U, V ) = u ik D ik (3) i=1 l=1 k u ik = 1 (4) i=1 k-means 3 U V J(U, V ) U V 2 4 1. k 2. V J(U, V ) U J(U, V ) 3. U J(U, V ) V J(U, V ) 2. 3. U V R k-means R kmeans() 7 k-means EWDEFD 12 # k-means km.akitapt7 <- kmeans(akitapt$"7 ", centers = 12 ) # kmeans() km.akitapt7$cluster $ 6 k-means 7 # plot(km.akitapt7$cluster) EFD EWD 15 1 1 29

6: k-means 7 table(km.akitapt7$cluster) $$ $$ 1 2 3 4 5 6 7 8 9 10 11 12 $$ 92 41 370 21 721 196 12 161 169 132 69 288 7 7 EWDEFDk-means 30

(a) (b) EWD (c) EFD (d) k-means 7: 7 31

(4) dummy variable *16 8 8: R Github makedummies makedummies() *17 Github makedummies devtools install github() # install.packages('devtools') library(devtools) install_github("toshi-ara/makedummies") 4.p. 9 as.factor() as.ordered() # str(ewd.akitapt7) $$ 'data.frame': 2272 obs. of 1 variable: $$ $ X: int 1 1 1 1 1 1 1 1 1 1... # factor ewd.akitapt7$factor <- as.factor(ewd.akitapt7$x) # factor 30 head(ewd.akitapt7$factor, 30) $$ [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 $$ Levels: 1 2 3 4 5 6 7 8 9 10 11 12 # summary(ewd.akitapt7) 16 II 17 AkitaPT$ 32

$$ X factor $$ Min. : 1.000 1 :1664 $$ 1st Qu.: 1.000 2 : 241 $$ Median : 1.000 3 : 148 $$ Mean : 1.666 4 : 85 $$ 3rd Qu.: 2.000 5 : 60 $$ Max. :12.000 6 : 30 $$ (Other): 44 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 1 12 makedummies() 110 119 library(makedummies) # factor fac.ewd.akitapt7 <- makedummies(dat = ewd.akitapt7, basal_level = TRUE) # 110 119 fac.ewd.akitapt7[110:119, ] $$ X factor_1 factor_2 factor_3 factor_4 factor_5 factor_6 factor_7 $$ 110 1 1 0 0 0 0 0 0 $$ 111 8 0 0 0 0 0 0 0 $$ 112 1 1 0 0 0 0 0 0 $$ 113 5 0 0 0 0 1 0 0 $$ 114 1 1 0 0 0 0 0 0 $$ 115 7 0 0 0 0 0 0 1 $$ 116 1 1 0 0 0 0 0 0 $$ 117 11 0 0 0 0 0 0 0 $$ 118 2 0 1 0 0 0 0 0 $$ 119 12 0 0 0 0 0 0 0 $$ factor_8 factor_9 factor_10 factor_11 factor_12 $$ 110 0 0 0 0 0 $$ 111 1 0 0 0 0 $$ 112 0 0 0 0 0 $$ 113 0 0 0 0 0 $$ 114 0 0 0 0 0 $$ 115 0 0 0 0 0 $$ 116 0 0 0 0 0 $$ 117 0 0 0 1 0 $$ 118 0 0 0 0 0 $$ 119 0 0 0 0 1 # factor 110 119 makedummies(dat = ewd.akitapt7, basal_level = FALSE)[110:119, ] $$ X factor_2 factor_3 factor_4 factor_5 factor_6 factor_7 factor_8 33

$$ 110 1 0 0 0 0 0 0 0 $$ 111 8 0 0 0 0 0 0 1 $$ 112 1 0 0 0 0 0 0 0 $$ 113 5 0 0 0 1 0 0 0 $$ 114 1 0 0 0 0 0 0 0 $$ 115 7 0 0 0 0 0 1 0 $$ 116 1 0 0 0 0 0 0 0 $$ 117 11 0 0 0 0 0 0 0 $$ 118 2 1 0 0 0 0 0 0 $$ 119 12 0 0 0 0 0 0 0 $$ factor_9 factor_10 factor_11 factor_12 $$ 110 0 0 0 0 $$ 111 0 0 0 0 $$ 112 0 0 0 0 $$ 113 0 0 0 0 $$ 114 0 0 0 0 $$ 115 0 0 0 0 $$ 116 0 0 0 0 $$ 117 0 0 1 0 $$ 118 0 0 0 0 $$ 119 0 0 0 1 makedummies() https://github.com/toshi-ara/makedummies 34

7. (1) PCA, principle component analysis R prcomp() # AkitaPT AkitaPT2 AkitaPT2 <- na.omit(akitapt) # AkitaPT2 pca.akitapt <- prcomp(akitapt2[, 8:31], scale = TRUE) # str(pca.akitapt) $$ List of 5 $$ $ sdev : num [1:24] 4.37 1.682 0.901 0.676 0.428... $$ $ rotation: num [1:24, 1:24] 0.211 0.218 0.221 0.222 0.222... $$..- attr(*, "dimnames")=list of 2 $$....$ : chr [1:24] "7 " "8 " "9 " "10 "... $$....$ : chr [1:24] "PC1" "PC2" "PC3" "PC4"... $$ $ center : Named num [1:24] 221 195 174 174 165... $$..- attr(*, "names")= chr [1:24] "7 " "8 " "9 " "10 "... $$ $ scale : Named num [1:24] 278 233 192 194 195... $$..- attr(*, "names")= chr [1:24] "7 " "8 " "9 " "10 "... $$ $ x : num [1:420, 1:24] -0.897-0.547-0.72 2.761-1.152... $$..- attr(*, "dimnames")=list of 2 $$....$ : chr [1:420] "1" "2" "3" "4"... $$....$ : chr [1:24] "PC1" "PC2" "PC3" "PC4"... $$ - attr(*, "class")= chr "prcomp" sdev rotation center scale FALSE TRUE x 5 prcomp # summary(pca.akitapt) $$ Importance of components: 35

$$ PC1 PC2 PC3 PC4 PC5 PC6 $$ Standard deviation 4.3705 1.6819 0.90055 0.67650 0.42823 0.31333 $$ Proportion of Variance 0.7959 0.1179 0.03379 0.01907 0.00764 0.00409 $$ Cumulative Proportion 0.7959 0.9137 0.94753 0.96660 0.97424 0.97833 $$ PC7 PC8 PC9 PC10 PC11 PC12 $$ Standard deviation 0.27947 0.25837 0.25060 0.23827 0.22384 0.20429 $$ Proportion of Variance 0.00325 0.00278 0.00262 0.00237 0.00209 0.00174 $$ Cumulative Proportion 0.98159 0.98437 0.98698 0.98935 0.99144 0.99318 $$ PC13 PC14 PC15 PC16 PC17 PC18 $$ Standard deviation 0.19152 0.15674 0.15377 0.13131 0.11470 0.1094 $$ Proportion of Variance 0.00153 0.00102 0.00099 0.00072 0.00055 0.0005 $$ Cumulative Proportion 0.99470 0.99573 0.99671 0.99743 0.99798 0.9985 $$ PC19 PC20 PC21 PC22 PC23 PC24 $$ Standard deviation 0.09935 0.08752 0.07695 0.07360 0.06336 0.06033 $$ Proportion of Variance 0.00041 0.00032 0.00025 0.00023 0.00017 0.00015 $$ Cumulative Proportion 0.99889 0.99921 0.99946 0.99968 0.99985 1.00000 summary 3 Standard deviation Proportion of Variance Cumulative Proportion 1 95% 2 1.8% 2 96.8% # (pca.akitapt$sdev)^2 $$ [1] 19.101099436 2.828649144 0.810999031 0.457649833 0.183383696 $$ [6] 0.098175966 0.078105105 0.066752971 0.062801020 0.056771129 $$ [11] 0.050102239 0.041734821 0.036681798 0.024566264 0.023646747 $$ [16] 0.017242774 0.013155494 0.011959651 0.009869630 0.007659118 $$ [21] 0.005921891 0.005417667 0.004014909 0.003639666 # sum((pca.akitapt$sdev)^2) $$ [1] 24 # cumsum((pca.akitapt$sdev)^2) $$ [1] 19.10110 21.92975 22.74075 23.19840 23.38178 23.47996 23.55806 $$ [8] 23.62482 23.68762 23.74439 23.79449 23.83622 23.87291 23.89747 $$ [15] 23.92112 23.93836 23.95152 23.96348 23.97335 23.98101 23.98693 $$ [22] 23.99235 23.99636 24.00000 1 1 1 2 36

9: 1 # screeplot(pca.akitapt) # 3 head(pca.akitapt$x, 3) $$ PC1 PC2 PC3 PC4 PC5 PC6 $$ 1-0.8966084-0.6547198 0.05743677-0.0174372-0.2183721 0.2391551 $$ 2-0.5468799 2.6091366 1.13015683-0.6090933 0.2261090-0.2051744 37

$$ 3-0.7197651-0.4899901 0.08891200-0.1057261-0.2314164 0.3796144 $$ PC7 PC8 PC9 PC10 PC11 PC12 $$ 1 0.02919507-0.01837612 0.11056736 0.1686095 0.27322132-0.30191131 $$ 2 0.37002232 0.20797614-0.21530557-0.3591910-0.03321528 0.06718958 $$ 3 0.15955751-0.08782887 0.05531007 0.3740363-0.20427001 0.02009382 $$ PC13 PC14 PC15 PC16 PC17 PC18 $$ 1-0.15506406-0.01827609 0.23677385 0.10385840 0.091388820-0.01410299 $$ 2-0.07191406 0.03693712 0.11790583-0.02559034 0.138812885 0.11304198 $$ 3 0.02422596-0.07121822 0.06977967-0.19250375-0.004846051 0.17942204 $$ PC19 PC20 PC21 PC22 PC23 PC24 $$ 1 0.07094017 0.083867042-0.09411155 0.001067141 0.04419933-0.03642677 $$ 2 0.01395086 0.007923403 0.02436290 0.080554288-0.03820526 0.02551922 $$ 3-0.01097022 0.118381850-0.01293863 0.068963209 0.02090427-0.07970435 # t(t(pca.akitapt$rotation) * pca.akitapt$sdev) $$ PC1 PC2 PC3 PC4 PC5 $$ 7 0.9241495-0.1960454635-0.144346384 0.194555584 0.141577637 $$ 8 0.9536301-0.1854083807-0.089439064 0.122275410 0.001696196 $$ 9 0.9670638-0.1757331765-0.073353030 0.071735302-0.064721086 $$ 10 0.9702068-0.1768741832-0.063440634 0.022058732-0.093624873 $$ 11 0.9715654-0.1695413126-0.026456589-0.012229649-0.101098550 $$ 12 0.9694408-0.1931473544-0.004144421-0.017991616-0.095501753 $$ 13 0.9723268-0.1812707138-0.042268216-0.029105751-0.082672384 $$ 14 0.9455260-0.1962067250-0.021324698-0.004518688-0.160224937 $$ 15 0.9735339-0.1927443081-0.024408684-0.009589823-0.057247979 $$ 16 0.9687567-0.2113314434-0.003152721 0.007747677-0.044682458 $$ 17 0.9562966-0.2280255548-0.056365780-0.077137133 0.029855283 $$ 18 0.9612941-0.2140655060-0.010050376-0.096441301 0.070619688 $$ 19 0.9696697-0.1729669976 0.041017042-0.112237985 0.031683233 $$ 20 0.9637721-0.1214733214 0.097408638-0.154937563 0.080820575 $$ 21 0.9564662-0.0883184960 0.223458496-0.071530393 0.056475884 $$ 22 0.9504136 0.0003986879 0.138102301-0.180495147 0.112227033 $$ 23 0.5963415 0.1482393380 0.686391228 0.384950432-0.030878635 $$ 0 0.8792584 0.3645605270 0.135407468-0.162068967 0.105837948 $$ 1 0.7784329 0.5554567926 0.152230838-0.111329361 0.053398382 $$ 2 0.6616318 0.7056021429 0.009892979-0.105000577-0.066052826 $$ 3 0.5496065 0.8012893841-0.091515835-0.038951837-0.059835560 $$ 4 0.6457438 0.7062381219-0.156629651 0.114363655-0.029440415 $$ 5 0.8049360 0.4709182566-0.259333128 0.141244146-0.025067695 $$ 6 0.8868176 0.0165690314-0.269884171 0.275233392 0.199038687 $$ PC6 PC7 PC8 PC9 PC10 $$ 7-0.0961927336-0.051736068 0.026288913-0.043253602 8.425672e-04 $$ 8-0.1253743254-0.038520813-0.010581651 0.039428382 1.403108e-02 $$ 9-0.0788174215 0.006059348-0.010351833-0.018943227 2.429620e-02 $$ 10-0.0111557208 0.033942006 0.022319294-0.016234284 1.611889e-02 38

$$ 11 0.0296450746 0.027465729 0.027432112 0.033572972 1.906933e-03 $$ 12 0.0317247989 0.041091336 0.024255253-0.026955764-6.151542e-03 $$ 13 0.0274537859 0.043950775 0.013440695-0.010676107 1.227752e-02 $$ 14 0.0216178429-0.043856448 0.113429133-0.035986260 1.891624e-02 $$ 15 0.0195369657 0.007421182-0.037335038-0.024276893 4.607617e-05 $$ 16 0.0001456152-0.017760623-0.049792001-0.011106275-9.520908e-03 $$ 17-0.0339522795-0.063358167-0.073067655 0.039393919-2.891853e-02 $$ 18-0.0143822788-0.057025509-0.062588202 0.015352436-1.347647e-02 $$ 19 0.0181132290-0.013828369-0.048222846 0.054483600-2.235576e-02 $$ 20 0.0422578161-0.019547145-0.016721187 0.004211554-2.021312e-02 $$ 21 0.0594896419 0.001879863-0.005278696-0.001800945-1.905121e-02 $$ 22 0.1199504139 0.013612033 0.049849988-0.033332494 1.865650e-02 $$ 23 0.0151918585-0.003066817-0.024810476 0.017030028-1.555048e-02 $$ 0-0.0589441490 0.075531282 0.010056598 0.056895879 1.037201e-01 $$ 1-0.1233116062-0.027005642 0.129746384-0.005137344-8.607876e-03 $$ 2-0.0805091551 0.114382489-0.097361421-0.114257801-5.533497e-02 $$ 3 0.0451648711-0.133645481 0.020038827 0.006013304-1.084799e-01 $$ 4 0.0825976579-0.088475521-0.064011205-0.029388735 1.534202e-01 $$ 5 0.0373498372 0.093160225 0.012687203 0.162385241-3.601254e-02 $$ 6 0.0795553170 0.064530758 0.031932163-0.070798199-4.131831e-02 $$ PC11 PC12 PC13 PC14 PC15 $$ 7-0.029668128 0.060895797-0.014633119 0.078187950-4.252147e-02 $$ 8-0.052067913 0.041677072 0.040944101-0.009695153 5.974190e-02 $$ 9-0.053449064-0.003134736 0.020918543-0.069101639 2.143447e-02 $$ 10-0.068744133-0.023017748-0.006695491-0.033572414-3.358737e-02 $$ 11-0.041223965-0.031577975-0.011192395 0.024341509-3.307091e-02 $$ 12-0.013736543-0.040567450-0.024895448 0.023316592-1.755904e-02 $$ 13-0.033965966-0.011874797-0.020301352 0.034242542-1.474736e-02 $$ 14 0.128792833 0.084299888 0.012175695-0.011212398 3.135463e-03 $$ 15 0.008660774-0.026624528-0.030184975 0.010605861 2.441432e-02 $$ 16 0.036197875-0.028058670-0.037380491-0.005186285 6.046065e-02 $$ 17 0.070490366-0.022973850-0.022209144 0.005495098 2.927336e-03 $$ 18 0.037292475-0.040452110-0.011638385 0.010186167-1.122956e-02 $$ 19 0.017547680-0.002448083 0.027324969 0.005571767-3.869801e-02 $$ 20-0.023090659 0.009923427 0.064256144-0.036234392-1.773451e-02 $$ 21-0.010854978 0.060919019 0.049374374-0.032385613-3.950093e-02 $$ 22-0.052926363 0.017616036 0.014453828 0.047300366 8.220952e-02 $$ 23 0.007882956-0.007748713-0.012542409 0.007010468 2.347157e-03 $$ 0 0.012365031 0.048780811-0.105813650-0.030758934-9.352040e-03 $$ 1 0.028008270-0.106267553 0.041388917 0.004805893-7.247093e-04 $$ 2 0.024184925 0.036750230 0.029815920 0.021644006-5.866862e-05 $$ 3-0.062412319 0.031245150-0.069725137-0.021915996-5.105420e-04 $$ 4 0.006231347-0.025107467 0.040076452 0.008107548-1.459926e-02 $$ 5 0.024960497 0.015197511 0.040688992 0.023473173 1.511964e-02 $$ 6 0.040649958-0.033951395-0.021559853-0.050348897-4.574580e-04 $$ PC16 PC17 PC18 PC19 PC20 $$ 7 0.0323337509-0.0152913788 0.009275663-0.008694579 0.0039219127 39

$$ 8-0.0349228035 0.0467664087-0.021746241 0.001584649 0.0027608971 $$ 9 0.0099875493-0.0354503385 0.008613688 0.020109174 0.0036377206 $$ 10-0.0007980633-0.0375647244 0.035363854 0.003924354 0.0030550792 $$ 11-0.0655996671 0.0063553737-0.008655771-0.055571725 0.0165636226 $$ 12 0.0358281950 0.0276296057-0.005801025 0.025441192 0.0368633464 $$ 13-0.0121656214 0.0155421997-0.018774982 0.029803396-0.0633781160 $$ 14-0.0105378809 0.0049418837 0.016269334 0.001775064 0.0008215279 $$ 15 0.0504578599 0.0160171012-0.015127408-0.018522838-0.0154730636 $$ 16 0.0381886889-0.0099483761-0.019986608-0.028175065 0.0116187881 $$ 17-0.0307830596-0.0331446703 0.005685850 0.013184166-0.0140158735 $$ 18-0.0209889718-0.0106856304 0.024170587-0.006697383-0.0013299273 $$ 19-0.0057194201 0.0249952975-0.012070549 0.046848586 0.0299497248 $$ 20 0.0332347080 0.0456172018 0.050036854-0.026821141-0.0177154814 $$ 21 0.0124118012-0.0356997440-0.066134996-0.013709609-0.0033435098 $$ 22-0.0205170922-0.0250475471 0.016240486 0.013681439 0.0097153446 $$ 23-0.0077074418 0.0024418574 0.017171954 0.005752346-0.0032117124 $$ 0 0.0015067776 0.0116816156 0.002294234-0.002694144 0.0021439571 $$ 1 0.0095887853-0.0035900345-0.015650927-0.001317819-0.0052950690 $$ 2-0.0184878255 0.0008714117 0.007375078-0.003746323 0.0040738432 $$ 3-0.0002909459 0.0062314212-0.001288677 0.005085579-0.0002721481 $$ 4-0.0005197089 0.0018042313-0.005474756 0.000402060 0.0018511762 $$ 5 0.0271548244-0.0189351490 0.011387093-0.002491434-0.0032739115 $$ 6-0.0263317525 0.0162621140-0.007585764 0.003409218-0.0002129954 $$ PC21 PC22 PC23 PC24 $$ 7-6.412538e-03 0.0053109398-4.935116e-03 0.0039398467 $$ 8 1.511748e-02-0.0032533036 8.957361e-03-0.0140438643 $$ 9-1.474365e-02-0.0059641658-4.825945e-03 0.0349526336 $$ 10 1.947855e-03 0.0028486069 4.157497e-03-0.0395912924 $$ 11-1.384967e-02 0.0068101318-1.893822e-03 0.0140880317 $$ 12 4.085690e-02 0.0047195596 1.180257e-02 0.0120626382 $$ 13 4.803169e-03-0.0008223606-1.860300e-02 0.0029127120 $$ 14-3.553636e-03-0.0055657205-7.109662e-04-0.0008497650 $$ 15-3.445102e-02-0.0148707181 3.543219e-02-0.0039230696 $$ 16 3.580342e-03 0.0131242278-3.762483e-02-0.0105197161 $$ 17 1.026650e-02 0.0393305446 2.095312e-02 0.0044498933 $$ 18 1.946200e-02-0.0540731114-4.861186e-03 0.0016934025 $$ 19-3.862909e-02 0.0006521608-1.334569e-02-0.0084686361 $$ 20 5.887385e-03 0.0168743245-4.291161e-03 0.0062942234 $$ 21 1.563989e-02-0.0056617208 5.917071e-03-0.0026055797 $$ 22-5.176816e-03 0.0006008597 3.653455e-03 0.0004943456 $$ 23-1.446328e-03 0.0001261953 5.895745e-04 0.0008894518 $$ 0 9.792239e-04 0.0008472397 3.907307e-04 0.0011149770 $$ 1-4.162962e-03 0.0016406836-4.853588e-04-0.0025941376 $$ 2-9.636108e-05 0.0001562017-5.366575e-04-0.0010190503 $$ 3-9.434845e-04-0.0023319911-5.427992e-04 0.0013611359 $$ 4 2.950254e-03 0.0038059502-3.358662e-05-0.0005964224 $$ 5 4.247722e-03-0.0053227072 4.346012e-04 0.0017683258 40

$$ 6-2.025179e-03 0.0011915063 3.991208e-04-0.0015298926 10 1 2 # biplot(pca.akitapt, choice = c(1, 2) # 1 2, cex = 0.5 # 50% ) 41

10: 42

(2) k-means 6. (3)p. 27 7 6 24 # km.akitapt <- na.omit(akitapt[, 8:31]) # k-means cluster <- kmeans(km.akitapt, centers = 12 ) # # table(cluster$cluster) $$ $$ 1 2 3 4 5 6 7 8 9 10 11 12 $$ 32 8 17 28 50 42 71 16 50 37 46 23 11 12p. 45 # km.akitapt$cluster <- cluster$cluster for (i in 1:12) { # 1 12 plot(x = 0, y = 0, xlim = c(1, 24) # x 76 24, ylim = c(0, max(km.akitapt[, 1:24], na.rm = T)) # y 0, type = "n" #, main = paste("", i, ", N=", nrow(subset(km.akitapt, cluster == i)), sep = ""), xlab = "" # x, ylab = "/" # y, xaxt = "n" # x ) # x axis(side = 1, at = 1:24, labels = colnames(km.akitapt[, 1:24])) } # for (j in 1:nrow(subset(km.AkitaPT, cluster == i))) { lines(x = 1:24, subset(km.akitapt, cluster == i)[j, 1:24], col = j) } 43

(a) 1 (d) 4 (b) 2 (e) 5 (c) 3 (f) 6 11: 1 6 44

(a) 7 (d) 10 (b) 8 (e) 11 (c) 9 (f) 12 12: 7 12 45

8. normalize *18 (1) 0 1 *19 (2) 0 1 2 R sagamihara2 *20 scale() (1) 0 1 0 1 # 0 1 scale(sagamihara2$"1975 ") $$ [,1] $$ [1,] 0.05356309 $$ [2,] -0.01706289 $$ [3,] -0.58101515 $$ [4,] -0.06521926 $$ [5,] -0.58649431 $$ [6,] 0.62038059 $$ [7,] -0.68582310 $$ [8,] -0.63580683 $$ [9,] -0.71894945 $$ [10,] -0.70246168 $$ [11,] -0.72855056 $$ [12,] -0.73196876 $$ [13,] 1.34325387 $$ [14,] 0.48078746 $$ [15,] -0.25146073 $$ [16,] 3.14192928 $$ [17,] 0.06489843 $$ attr(,"scaled:center") $$ [1] 29822.88 $$ attr(,"scaled:scale") $$ [1] 39787.06 0 scale() # 0 1 scale(sagamihara2$"1975 ", center = TRUE, scale = TRUE) $$ [,1] 18 standardization 19 20 sagamihara 46

$$ [1,] 0.05356309 $$ [2,] -0.01706289 $$ [3,] -0.58101515 $$ [4,] -0.06521926 $$ [5,] -0.58649431 $$ [6,] 0.62038059 $$ [7,] -0.68582310 $$ [8,] -0.63580683 $$ [9,] -0.71894945 $$ [10,] -0.70246168 $$ [11,] -0.72855056 $$ [12,] -0.73196876 $$ [13,] 1.34325387 $$ [14,] 0.48078746 $$ [15,] -0.25146073 $$ [16,] 3.14192928 $$ [17,] 0.06489843 $$ attr(,"scaled:center") $$ [1] 29822.88 $$ attr(,"scaled:scale") $$ [1] 39787.06 0 1 # 0 1 scaled1.sagamihara2 <- scale(sagamihara2[, -c(1:2)]) # scaled1.sagamihara2 $$ 1975 1980 1985 1989 1993 $$ 1 0.05356309-0.002725056 0.04004867 0.069700992 0.372650342 $$ 2-0.01706289-0.071718560-0.04268049-0.056444382-0.089112135 $$ 3-0.58101515-0.520944733-0.51207777-0.479887826-0.492014359 $$ 4-0.06521926-0.058655898-0.04174908 0.002895792-0.003625827 $$ 6-0.58649431-0.555401043-0.51680135-0.492320488-0.492039621 $$ 7 0.62038059 0.891480913 1.28517726 1.617160248 1.719918598 $$ 8-0.68582310-0.672803238-0.66586018-0.652669255-0.630904348 $$ 9-0.63580683-0.621118772-0.64962704-0.653130729-0.641640660 $$ 10-0.71894945-0.666170803-0.67231352-0.672729817-0.693781245 $$ 11-0.70246168-0.654119182-0.65238800-0.653755077-0.677664146 $$ 12-0.72855056-0.696057204-0.68638446-0.694201947-0.731497279 $$ 13-0.73196876-0.697310896-0.69094172-0.695152042-0.732002517 $$ 18 1.34325387 1.002958402 0.75700137 0.587203768 0.587603946 $$ 19 0.48078746 0.315490297 0.19378118 0.125322225 0.051243044 $$ 20-0.25146073-0.290326057-0.32925533-0.366663734-0.407551159 $$ 21 3.14192928 3.278085949 3.22515422 3.114712410 3.008920255 $$ 22 0.06489843 0.019335881-0.04108378-0.100040136-0.148502889 $$ 1998 2003 2008 2009 2010 47

$$ 1 0.472483742 0.55511657 0.66457321 0.67301684 0.68979587 $$ 2-0.076819183-0.09238818-0.08985591-0.09271953-0.09199637 $$ 3-0.502270489-0.50096755-0.50157359-0.50340408-0.50435073 $$ 4 0.001663909 0.14747535 0.16389951 0.18672191 0.19397520 $$ 6-0.478785913-0.49952229-0.52987445-0.52986895-0.52794381 $$ 7 1.823279829 1.91597445 1.86431965 1.87093894 1.88421085 $$ 8-0.636198173-0.64529136-0.64517786-0.64747418-0.64749510 $$ 9-0.639688059-0.64577312-0.63821863-0.64029894-0.64215189 $$ 10-0.697677907-0.70046504-0.70507606-0.70332807-0.69914613 $$ 11-0.682904927-0.68264009-0.67697054-0.67642089-0.67547986 $$ 12-0.748031987-0.74851382-0.75364414-0.75348107-0.75306621 $$ 13-0.746116485-0.74646002-0.74744188-0.74844366-0.75043120 $$ 18 0.669767302 0.60684190 0.70334954 0.68704789 0.67664523 $$ 19-0.016244720-0.06074445-0.09737676-0.09741293-0.10570817 $$ 20-0.450289544-0.47988441-0.50733632-0.51018617-0.51574470 $$ 21 2.897272795 2.81493022 2.78640511 2.78042597 2.76824385 $$ 22-0.189440190-0.23768817-0.29000086-0.29511308-0.29935686 $$ 2011 2012 2013 1998.sri $$ 1 0.69784823 0.6995403 0.72758641 0.472483742 $$ 2-0.09417476-0.0947432-0.08943180-0.076819183 $$ 3-0.49858758-0.5010988-0.49768843-0.502270489 $$ 4 0.18689643 0.1991349 0.11163081 0.001663909 $$ 6-0.52852415-0.5321505-0.53077408-0.478785913 $$ 7 1.89142246 1.8945930 1.88618503 1.823279829 $$ 8-0.64861287-0.6526569-0.65126256-0.636198173 $$ 9-0.64465067-0.6432496-0.64188509-0.639688059 $$ 10-0.69745547-0.6985573-0.69627442-0.697677907 $$ 11-0.67444050-0.6719070-0.67278266-0.682904927 $$ 12-0.75087171-0.7529944-0.75427048-0.748031987 $$ 13-0.75021135-0.7511322-0.75241903-0.746116485 $$ 18 0.69333574 0.7019465 0.75759432 0.669767302 $$ 19-0.11068390-0.1072703-0.09886938-0.016244720 $$ 20-0.51685769-0.5197202-0.52185346-0.450289544 $$ 21 2.75723520 2.7481744 2.73856111 2.897272795 $$ 22-0.31166741-0.3179087-0.31404628-0.189440190 $$ attr(,"scaled:center") $$ 1975 1980 1985 1989 1993 1998 $$ 29822.88 35278.76 42350.12 52330.65 59485.06 58667.18 $$ 2003 2008 2009 2010 2011 2012 $$ 60735.41 63729.71 63320.53 63699.24 63416.94 64401.35 $$ 2013 1998.sri $$ 64990.76 58667.18 $$ attr(,"scaled:scale") $$ 1975 1980 1985 1989 1993 1998 $$ 39787.06 49453.93 60123.91 73676.90 79170.57 76220.24 $$ 2003 2008 2009 2010 2011 2012 $$ 78878.19 81905.64 81390.94 81973.18 81772.88 82700.96 $$ 2013 1998.sri 48

$$ 83178.07 76220.24 matrixstats colvars() colmeans() # install.packages('matrixstats') library(matrixstats) # colvars(scaled1.sagamihara2) $$ [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 # colmeans(scaled1.sagamihara2) $$ 1975 1980 1985 1989 1993 $$ 5.653158e-17 6.525622e-17-1.346962e-17-2.178609e-17 1.201551e-17 $$ 1998 2003 2008 2009 2010 $$ -6.870015e-17 4.081702e-17-4.571507e-17-9.796086e-18-2.285753e-17 $$ 2011 2012 2013 1998.sri $$ -3.265362e-18 0.000000e+00 3.102094e-17-6.870015e-17 0 (2) 0 1 0 1 # 0 1 scale(sagamihara2$"1975 ", center = min(sagamihara2$"1975 "), scale = max(sagamihara2$"1975 ")) $$ [,1] $$ [1,] 0.2018588009 $$ [2,] 0.1837099806 $$ [3,] 0.0387906815 $$ [4,] 0.1713351977 $$ [5,] 0.0373826947 $$ [6,] 0.3475143867 $$ [7,] 0.0118580904 $$ [8,] 0.0247108137 $$ [9,] 0.0033455832 $$ [10,] 0.0075824609 $$ [11,] 0.0008783771 $$ [12,] 0.0000000000 $$ [13,] 0.5332717608 $$ [14,] 0.3116430172 $$ [15,] 0.1234765648 $$ [16,] 0.9954789416 $$ [17,] 0.2047716543 49

$$ attr(,"scaled:center") $$ [1] 700 $$ attr(,"scaled:scale") $$ [1] 154831 1 0 1 matrixstats colmins() colmaxs() as.matrix() # 0 1 scaled2.sagamihara2 <- scale(sagamihara2[, -c(1:2)], center = colmins(as.matrix(sagamihara2[, -c(1:2)])), scale = colmaxs(as.matrix(sagamihara2[, -c(1:2)]))) # scaled2.sagamihara2 $$ 1975 1980 1985 1989 1993 $$ 1 0.2018588009 0.1740183289 0.186024659 0.1999623864 0.2937692936 $$ 2 0.1837099806 0.1567330148 0.164971493 0.1669830703 0.1709690530 $$ 3 0.0387906815 0.0441859640 0.045517843 0.0562784542 0.0638219971 $$ 4 0.1713351977 0.1600056740 0.165208521 0.1824969040 0.1937031202 $$ 6 0.0373826947 0.0355534391 0.044315772 0.0530280718 0.0638152790 $$ 7 0.3475143867 0.3980485630 0.502888779 0.6045285349 0.6520592671 $$ 8 0.0118580904 0.0061400354 0.006382826 0.0111066558 0.0268858560 $$ 9 0.0247108137 0.0190888228 0.010513885 0.0109860085 0.0240306614 $$ 10 0.0033455832 0.0078016951 0.004740560 0.0058620433 0.0101644928 $$ 11 0.0075824609 0.0108210524 0.009811266 0.0108227796 0.0144506438 $$ 12 0.0008783771 0.0003140942 0.001159744 0.0002483917 0.0001343621 $$ 13 0.0000000000 0.0000000000 0.000000000 0.0000000000 0.0000000000 $$ 18 0.5332717608 0.4259776183 0.368476968 0.3352577773 0.3509336486 $$ 19 0.3116430172 0.2537425339 0.225146979 0.2145039441 0.2082948442 $$ 20 0.1234765648 0.1019641021 0.092043054 0.0858796436 0.0862839810 $$ 21 0.9954789416 0.9959775676 0.996580024 0.9960470241 0.9948539316 $$ 22 0.2047716543 0.1795453739 0.165377827 0.1555854414 0.1551747883 $$ 1998 2003 2008 2009 2010 $$ 1 0.3328395910 0.3636427935 0.397873623 0.400881149 0.406976784 $$ 2 0.1830424547 0.1830237789 0.186222393 0.185690314 0.186462781 $$ 3 0.0670201576 0.0690520985 0.070717104 0.070277810 0.070153224 $$ 4 0.2044451123 0.2499328081 0.257412177 0.264220260 0.267124537 $$ 6 0.0734244968 0.0694552502 0.062777443 0.062840530 0.063498508 $$ 7 0.7012071643 0.7432489780 0.734456349 0.737526845 0.743876045 $$ 8 0.0304975349 0.0287935156 0.030429660 0.029790555 0.029777614 $$ 9 0.0295458286 0.0286591317 0.032382035 0.031806976 0.031284732 $$ 10 0.0137317619 0.0134030243 0.013625527 0.014094233 0.015208811 $$ 11 0.0177604133 0.0183752281 0.021510385 0.021655813 0.021884172 $$ 12 0.0000000000 0.0000000000 0.000000000 0.000000000 0.000000000 $$ 13 0.0005223651 0.0005728997 0.001740012 0.001415638 0.000743236 $$ 18 0.3866396182 0.3780713791 0.408752124 0.404824219 0.403267486 50

$$ 19 0.1995613564 0.1918506783 0.184112457 0.184371353 0.182595201 $$ 20 0.0811955721 0.0749331617 0.069100400 0.068371878 0.066939416 $$ 21 0.9940894031 0.9940093079 0.993142708 0.993115164 0.993228294 $$ 22 0.1523302492 0.1424928918 0.130072752 0.128812728 0.127974234 $$ 2011 2012 2013 1998.sri $$ 1 0.4100815552 0.4118445683 0.420993309 0.3328395910 $$ 2 0.1858877612 0.1866373192 0.188879667 0.1830424547 $$ 3 0.0714127470 0.0714212248 0.072894572 0.0670201576 $$ 4 0.2654491076 0.2699620815 0.246001250 0.2044451123 $$ 6 0.0629387574 0.0626169955 0.063494991 0.0734244968 $$ 7 0.7479403498 0.7506839734 0.750149430 0.7012071643 $$ 8 0.0289458745 0.0284491803 0.029264394 0.0304975349 $$ 9 0.0300674319 0.0311165052 0.031928519 0.0295458286 $$ 10 0.0151202559 0.0154348288 0.016476592 0.0137317619 $$ 11 0.0216349815 0.0229911066 0.023150567 0.0177604133 $$ 12 0.0000000000 0.0000000000 0.000000000 0.0000000000 $$ 13 0.0001869262 0.0005279795 0.000525994 0.0005223651 $$ 18 0.4088042259 0.4125268275 0.429518511 0.3866396182 $$ 19 0.1812146052 0.1830854573 0.186198464 0.1995613564 $$ 20 0.0662411210 0.0661414299 0.066029326 0.0811955721 $$ 21 0.9930214204 0.9927042835 0.992308192 0.9940894031 $$ 22 0.1243232578 0.1233620636 0.125067030 0.1523302492 $$ attr(,"scaled:center") $$ [1] 700 794 808 1114 1532 1652 1694 2002 1994 1968 2016 2128 2252 1652 $$ attr(,"scaled:scale") $$ [1] 154831 197393 236259 281813 297703 279498 282772 291952 289622 290621 $$ [11] 288884 291678 292779 279498 # colmins(scaled2.sagamihara2) $$ [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # colmaxs(scaled2.sagamihara2) $$ [1] 0.9954789 0.9959776 0.9965800 0.9960470 0.9948539 0.9940894 0.9940093 $$ [8] 0.9931427 0.9931152 0.9932283 0.9930214 0.9927043 0.9923082 0.9940894 1 51

9. R sampling random sampling (1) R sample() # nrow(akitapt) $$ [1] 2272 # AkitaPT 5 # sampling.target <- sample(nrow(akitapt), 5, replace = FALSE) ## AkitaPT[sampling.target, ] $$ 1224 $$ 1425 41410 4 58 <NA> <NA> $$ 1663 60210 6 121 <NA> <NA> $$ 50 130 1 1420 2 2 $$ 845 11970 3 342 2 2 $$ 1996 61090 6 258 1 1 $$ 7 8 9 10 11 12 13 $$ 1425 1 1 33 24 16 23 24 12 13 $$ 1663 2 1 5 2 8 1 8 5 5 $$ 50 1 2 37 52 61 34 37 41 37 $$ 845 1 1 3 15 32 40 69 100 121 $$ 1996 2 2 5 2 4 6 5 2 7 $$ 14 15 16 17 18 19 20 21 22 23 $$ 1425 19 25 19 26 18 NA NA NA NA NA $$ 1663 5 6 8 8 8 NA NA NA NA NA $$ 50 39 53 41 48 46 38 45 17 13 11 $$ 845 97 56 26 6 3 3 4 2 0 0 $$ 1996 3 12 13 5 3 NA NA NA NA NA $$ 0 1 2 3 4 5 6 12 $$ 1425 NA NA NA NA NA NA NA 252 52

$$ 1663 NA NA NA NA NA NA NA 69 $$ 50 17 10 16 11 6 13 28 526 $$ 845 0 0 0 0 0 0 6 568 $$ 1996 NA NA NA NA NA NA NA 67 $$ 24 $$ 1425 NA $$ 1663 NA $$ 50 751 $$ 845 583 $$ 1996 NA 2 # AkitaPT 5 sampling.target.rep <- sample(nrow(akitapt), 5, replace = TRUE) ## AkitaPT[sampling.target.rep, ] $$ 1224 $$ 602 11320 3 105 <NA> <NA> $$ 79 10010 3 7 2 6 $$ 424 10870 3 101 <NA> <NA> $$ 1778 60500 6 176 <NA> <NA> $$ 1335 41170 4 50 <NA> <NA> $$ 7 8 9 10 11 12 13 $$ 602 1 2 28 68 47 65 54 22 40 $$ 79 2 1 155 146 168 159 164 155 185 $$ 424 2 2 11 5 13 16 15 7 10 $$ 1778 1 2 13 14 19 8 15 12 12 $$ 1335 2 1 235 124 123 115 109 116 116 $$ 14 15 16 17 18 19 20 21 22 23 $$ 602 42 24 38 28 4 NA NA NA NA NA $$ 79 200 202 187 192 144 95 78 72 44 27 $$ 424 17 18 14 9 4 NA NA NA NA NA $$ 1778 7 11 5 10 1 NA NA NA NA NA $$ 1335 124 112 104 138 125 NA NA NA NA NA $$ 0 1 2 3 4 5 6 12 $$ 602 NA NA NA NA NA NA NA 460 $$ 79 20 11 12 13 13 27 63 2057 $$ 424 NA NA NA NA NA NA NA 139 $$ 1778 NA NA NA NA NA NA NA 127 $$ 1335 NA NA NA NA NA NA NA 1541 $$ 24 $$ 602 NA $$ 79 2532 $$ 424 NA $$ 1778 NA 53

$$ 1335 NA 54

(2) AkitaPT R subset() 2 TRUE FALSE R 3 3: == = is.null NULL!!= \= is.na NA & = is.nan NaN = is.finite && > is.infinite < complete.cases xor() 4 H22 4: H22 1 2 3 4 5 6 55

a) # AkitaPT AkitaPT2 AkitaPT2 <- na.omit(akitapt) # AkitaPT2 '' AkitaPT2$ <- as.factor(akitapt2$) # table(akitapt2$) $$ $$ 1 2 3 6 $$ 142 142 4 132 # ''==3 TRUE FALSE list.rain <- AkitaPT2$ == 3 ## list.rain list.rain $$ [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [155] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [166] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [188] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [199] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [210] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [221] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE $$ [232] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [243] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [254] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [276] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [287] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 56

$$ [298] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [309] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [320] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [331] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [342] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [364] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [375] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [386] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [408] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $$ [419] FALSE FALSE ## ''==3AkitaPT2rain AkitaPT2rain <- AkitaPT2[list.rain, ] # AkitaPT2rain <- AkitaPT2[AkitaPT2 ''==3, ] # ## AkitaPT2rain AkitaPT2rain $$ 1224 $$ 353 10700 3 13 2 3 $$ 354 10700 3 13 2 3 $$ 355 10700 3 13 2 3 $$ 356 10700 3 13 2 3 $$ 7 8 9 10 11 12 13 $$ 353 1 1 317 294 263 275 248 233 263 $$ 354 1 2 55 59 80 82 73 62 75 $$ 355 2 1 432 362 240 265 247 220 258 $$ 356 2 2 67 53 63 50 53 36 48 $$ 14 15 16 17 18 19 20 21 22 23 $$ 353 292 265 288 328 281 199 163 122 59 51 $$ 354 52 50 45 50 30 31 24 19 15 31 $$ 355 263 281 311 366 286 181 97 77 70 27 $$ 356 51 49 52 52 27 27 26 24 29 35 $$ 0 1 2 3 4 5 6 12 $$ 353 22 13 6 11 13 25 60 3347 $$ 354 18 18 9 15 23 32 29 713 $$ 355 27 10 5 13 12 27 100 3531 $$ 356 33 30 27 27 37 59 56 601 $$ 24 $$ 353 4091 $$ 354 977 $$ 355 4177 $$ 356 1011 V 10.p. 61 57

b) subset() subset() subset() # ==3 subset(akitapt2 #, == 3 # ) $$ 1224 $$ 353 10700 3 13 2 3 $$ 354 10700 3 13 2 3 $$ 355 10700 3 13 2 3 $$ 356 10700 3 13 2 3 $$ 7 8 9 10 11 12 13 $$ 353 1 1 317 294 263 275 248 233 263 $$ 354 1 2 55 59 80 82 73 62 75 $$ 355 2 1 432 362 240 265 247 220 258 $$ 356 2 2 67 53 63 50 53 36 48 $$ 14 15 16 17 18 19 20 21 22 23 $$ 353 292 265 288 328 281 199 163 122 59 51 $$ 354 52 50 45 50 30 31 24 19 15 31 $$ 355 263 281 311 366 286 181 97 77 70 27 $$ 356 51 49 52 52 27 27 26 24 29 35 $$ 0 1 2 3 4 5 6 12 $$ 353 22 13 6 11 13 25 60 3347 $$ 354 18 18 9 15 23 32 29 713 $$ 355 27 10 5 13 12 27 100 3531 $$ 356 33 30 27 27 37 59 56 601 $$ 24 $$ 353 4091 $$ 354 977 $$ 355 4177 $$ 356 1011 #!=3AkitaPT2norain AkitaPT2norain <- subset(akitapt2 #,!= 3 # ) # AkitaPT2norain nrow(akitapt2norain) $$ [1] 416 # AkitaPT2norain 7 summary(akitapt2norain)[, 1:7] 58

$$ 1224 $$ 10 : 4 1: 76 7 : 84 1: 0 1:142 $$ 20 : 4 3:280 13 : 76 2:416 2:142 $$ 30 : 4 4: 44 1420 : 40 3: 0 $$ 40 : 4 6: 16 46 : 32 6:132 $$ 50 : 4 1040 : 16 $$ 60 : 4 101 : 12 $$ (Other):392 (Other):156 $$ $$ 1:208 1:208 $$ 2:208 2:208 $$ $$ $$ $$ $$ 59

IV (a) (b) (c) (d) R Web 60

V 10. sagamihara # 2 sagamihara[2, ] $$ 1975 1980 1985 1989 1993 1998 2003 2008 $$ 2 29144 31732 39784 48172 52430 52812 53448 56370 $$ 2009 2010 2011 2012 2013 1998.sri $$ 2 55774 56158 55716 56566 57552 52812 # 1 3 sagamihara[c(1, 3), ] $$ 1975 1980 1985 1989 1993 1998 2003 2008 $$ 1 31954 35144 44758 57466 88988 94680 104522 118162 $$ 3 6706 9516 11562 16974 20532 20384 21220 22648 $$ 2009 2010 2011 2012 2013 1998.sri $$ 1 118098 120244 120482 122254 125510 94680 $$ 3 22348 22356 22646 22960 23594 20384 # 4 6 sagamihara[4:6, ] $$ 1975 1980 1985 1989 1993 1998 2003 $$ 4 27228 32378 39840 52544 59198 58794 72368 $$ 5 NA NA NA 14802 30840 37590 39480 $$ 6 6488 7812 11278 16058 20530 22174 21334 $$ 2008 2009 2010 2011 2012 2013 1998.sri $$ 4 77154 78518 79600 78700 80870 74276 58794 $$ 5 42410 42404 42796 42626 43370 44614 37590 $$ 6 20330 20194 20422 20198 20392 20842 22174 # sagamihara[nrow(sagamihara), ] $$ 1975 1980 1985 1989 1993 1998 $$ 25 NA NA NA NA 2947 4464 $$ 2003 2008 2009 2010 2011 2012 2013 1998.sri $$ 25 10727 16526 16678 17183 17184 17582 18471 4464 # 8 sagamihara[, 8] $$ [1] 94680 52812 20384 58794 37590 22174 197638 10176 9910 5490 $$ [11] 6616 1652 1798 NA NA NA NA 109717 57429 24346 $$ [21] 279498 44228 12845 67917 4464 61

# 3 5 sagamihara[, 3:5] $$ 1975 1980 1985 $$ 1 31954 35144 44758 $$ 2 29144 31732 39784 $$ 3 6706 9516 11562 $$ 4 27228 32378 39840 $$ 5 NA NA NA $$ 6 6488 7812 11278 $$ 7 54506 79366 119620 $$ 8 2536 2006 2316 $$ 9 4526 4562 3292 $$ 10 1218 2334 1928 $$ 11 1874 2930 3126 $$ 12 836 856 1082 $$ 13 700 794 808 $$ 14 NA NA NA $$ 15 NA NA NA $$ 16 NA NA NA $$ 17 NA NA NA $$ 18 83267 84879 87864 $$ 19 48952 50881 54001 $$ 20 19818 20921 22554 $$ 21 154831 197393 236259 $$ 22 32405 36235 39880 $$ 23 NA NA NA $$ 24 NA NA NA $$ 25 NA NA NA # '1998 '8 sagamihara$"1998 " $$ [1] 94680 52812 20384 58794 37590 22174 197638 10176 9910 5490 $$ [11] 6616 1652 1798 NA NA NA NA 109717 57429 24346 $$ [21] 279498 44228 12845 67917 4464 # 3 5 sagamihara[3, 5] $$ [1] 11562 # 1 3 5 8 sagamihara[c(1, 3), 5:8] $$ 1985 1989 1993 1998 $$ 1 44758 57466 88988 94680 $$ 3 11562 16974 20532 20384 # sagamihara[, ncol(sagamihara)] 62

$$ [1] 94680.000 52812.000 20384.000 58794.000 37590.000 22174.000 $$ [7] 197638.000 10176.000 9910.000 5490.000 6616.000 1652.000 $$ [13] 1798.000-1454.228 2127.083 60985.515 159289.434 109717.000 $$ [19] 57429.000 24346.000 279498.000 44228.000 12845.000 67917.000 $$ [25] 4464.000 # 6 head(sagamihara) $$ 1975 1980 1985 1989 1993 1998 2003 $$ 1 31954 35144 44758 57466 88988 94680 104522 $$ 2 29144 31732 39784 48172 52430 52812 53448 $$ 3 6706 9516 11562 16974 20532 20384 21220 $$ 4 27228 32378 39840 52544 59198 58794 72368 $$ 5 NA NA NA 14802 30840 37590 39480 $$ 6 6488 7812 11278 16058 20530 22174 21334 $$ 2008 2009 2010 2011 2012 2013 1998.sri $$ 1 118162 118098 120244 120482 122254 125510 94680 $$ 2 56370 55774 56158 55716 56566 57552 52812 $$ 3 22648 22348 22356 22646 22960 23594 20384 $$ 4 77154 78518 79600 78700 80870 74276 58794 $$ 5 42410 42404 42796 42626 43370 44614 37590 $$ 6 20330 20194 20422 20198 20392 20842 22174 # 3 head(sagamihara, 3) $$ 1975 1980 1985 1989 1993 1998 2003 2008 $$ 1 31954 35144 44758 57466 88988 94680 104522 118162 $$ 2 29144 31732 39784 48172 52430 52812 53448 56370 $$ 3 6706 9516 11562 16974 20532 20384 21220 22648 $$ 2009 2010 2011 2012 2013 1998.sri $$ 1 118098 120244 120482 122254 125510 94680 $$ 2 55774 56158 55716 56566 57552 52812 $$ 3 22348 22356 22646 22960 23594 20384 # 6 tail(sagamihara) $$ 1975 1980 1985 1989 1993 1998 $$ 20 19818 20921 22554 25316 27219 24346 $$ 21 154831 197393 236259 281813 297703 279498 $$ 22 32405 36235 39880 44960 47728 44228 $$ 23 NA NA NA 2944 9190 12845 $$ 24 NA NA NA 33755 55492 67917 $$ 25 NA NA NA NA 2947 4464 $$ 2003 2008 2009 2010 2011 2012 2013 1998.sri $$ 20 22883 22176 21796 21422 21152 21420 21584 24346 $$ 21 282772 291952 289622 290621 288884 291678 292779 279498 $$ 22 41987 39977 39301 39160 37931 38110 38869 44228 63

$$ 23 16037 19994 20539 21228 21096 21330 21719 12845 $$ 24 78072 88320 88427 88065 87242 88377 91060 67917 $$ 25 10727 16526 16678 17183 17184 17582 18471 4464 # 3 tail(sagamihara, 3) $$ 1975 1980 1985 1989 1993 1998 $$ 23 NA NA NA 2944 9190 12845 $$ 24 NA NA NA 33755 55492 67917 $$ 25 NA NA NA NA 2947 4464 $$ 2003 2008 2009 2010 2011 2012 2013 1998.sri $$ 23 16037 19994 20539 21228 21096 21330 21719 12845 $$ 24 78072 88320 88427 88065 87242 88377 91060 67917 $$ 25 10727 16526 16678 17183 17184 17582 18471 4464 11. (1) sagamihara nrow() ncol() summary() *21 # nrow(sagamihara) $$ [1] 25 # ncol(sagamihara) $$ [1] 16 # summary(sagamihara) $$ 1975 1980 $$ :7 : 2 Min. : 700 Min. : 794 $$ :2 : 1 1st Qu.: 2536 1st Qu.: 2930 $$ :6 : 1 Median : 19818 Median : 20921 $$ :6 : 1 Mean : 29823 Mean : 35279 $$ :4 : 1 3rd Qu.: 32405 3rd Qu.: 36235 $$ : 1 Max. :154831 Max. :197393 $$ (Other) :18 NA's :8 NA's :8 $$ 1985 1989 1993 1998 $$ Min. : 808 Min. : 1114 Min. : 1532 Min. : 1652 $$ 1st Qu.: 3126 1st Qu.: 4198 1st Qu.: 8686 1st Qu.: 9910 $$ Median : 22554 Median : 21145 Median : 27219 Median : 24346 21 4.0 64

$$ Mean : 42350 Mean : 47056 Mean : 52844 Mean : 53341 $$ 3rd Qu.: 44758 3rd Qu.: 53774 3rd Qu.: 59198 3rd Qu.: 58794 $$ Max. :236259 Max. :281813 Max. :297703 Max. :279498 $$ NA's :8 NA's :5 NA's :4 NA's :4 $$ 2003 2008 2009 2010 $$ Min. : 1694 Min. : 2002 Min. : 1994 Min. : 1968 $$ 1st Qu.: 9798 1st Qu.: 10886 1st Qu.: 10622 1st Qu.: 10622 $$ Median : 22883 Median : 22648 Median : 22348 Median : 22356 $$ Mean : 56685 Mean : 59663 Mean : 59177 Mean : 59437 $$ 3rd Qu.: 72368 3rd Qu.: 77154 3rd Qu.: 78518 3rd Qu.: 79600 $$ Max. :282772 Max. :291952 Max. :289622 Max. :290621 $$ $$ 2011 2012 2013 1998.sri $$ Min. : 2016 Min. : 2128 Min. : 2252 Min. : -1454 $$ 1st Qu.: 10378 1st Qu.: 10426 1st Qu.: 10820 1st Qu.: 6616 $$ Median : 22646 Median : 22960 Median : 23594 Median : 24346 $$ Mean : 59244 Mean : 60029 Mean : 60886 Mean : 53644 $$ 3rd Qu.: 78700 3rd Qu.: 80870 3rd Qu.: 74276 3rd Qu.: 60986 $$ Max. :288884 Max. :291678 Max. :292779 Max. :279498 $$ summary(sagamihara) character 1975 1998 *22 summary() Min. 1st. Qu. Median Mean 3rd Qu. Max. NA s *23 17) (2) AkitaPT table() table(akitapt$) $$ $$ 1 3 4 6 $$ 76 820 704 672 table() 22 (2)p. 18 23 65

table(akitapt$, AkitaPT$) $$ $$ 1 2 3 6 $$ 1 4 60 0 12 $$ 3 230 146 4 108 $$ 4 196 40 0 8 $$ 6 64 8 0 4 R # plot(x = 0, y = 0, xlim = c(1, 24) # x 76 24, ylim = c(0, max(akitapt[, 8:31], na.rm = T)) # y 0, type = "n" #, xlab = "" # x, ylab = "/" # y, xaxt = "n" # x ) # x axis(side = 1, at = 1:24, labels = colnames(akitapt[, 8:31])) # 2272 '24 ' 1 AkitaPTover10000 <- na.omit(akitapt) # NA AkitaPTover10000 <- AkitaPTover10000[AkitaPTover10000$"24 " >= 10000, ] # for (i in 1:nrow(AkitaPTover10000)) { lines(x = 1:24, AkitaPTover10000[i, 8:31], col = i) } R ggplot2 18) MurrellR 19) R R R Graphical Manual 66

13: / 67

12. R R MacWindows Mac R version 3.3.1 (2016-06-21) Platform: x86 64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.5 (Yosemite) R version 3.3.1 (2016-06-21) Platform: x86 64-w64-mingw32/x64 (64-bit) Windows Running under: Windows 7 x64 (build 7601) Service Pack 1 (1) RscriptAndData.zip RscriptAndData.zip 14 Windows JACIC Report Rscript data JACIC Report Rscript.Rjyoukou 20141111.csv fixed jyoukou 20141111.csv JACIC Report Rscript JACIC Report Rscript.RUTF-8 R jyoukou 20141111.csvUTF-8 4.0 fixed jyoukou 20141111.csvUTF-8 jyoukou 20141111.csv LibreOffice Calc 4.0 14: R JACIC Report Rscript.R jyoukou 20141111.csv UTF-8 Windows R R CP932 zkntrf05.xls 22 *24 JACIC Report Rscript (2) R JACIC Report Rscript.R Windows R 15 JACIC Report Rscript.R CP932 Mac Linux *25 Rstudio R IDEIntegrated Development Environment UTF-8 R 16p. 70 24 http://www.mlit.go.jp/road/census/h22-1/data/xls/zkntrf05.xls 25 68

15: Windows R R 69

16: Windows Rstudio R 70

(3) Windows 7 hasegawa setwd("c:\\users\\hasegawa\\desktop\\jacic_report_rscript") (4) jyoukou 20141111.csv Mac Windows incomplete final line found by readtableheader on jyoukou 20141111.csv jyoukou 20141111.csv Windows LibreOffice Calc fixed jyoukou 20141111.csv fixed jyoukou 20141111.csv Windows sagamihara <- read.csv(file = "fixed_jyoukou_20141111.csv", header = TRUE, fileencoding = "UTF-8") (5) R library() R R library() install.packages() mice #install.packages( mice ) # library(mice) R #install.packages("mice") library(mice) Windows library(xlconnect) XLConnect XLConnectJars Error :.onload loadnamespace() rjava : call: fun(libname, pkgname) error: JAVA HOME cannot be determined from the Registry Error: XLConnectJars Java 64bit OS 32bit Java 32bit OS 64bit Java http://www.java.com/en/download/manual.jsp 2014-11 71

1),,, :, IT Text,, 2006. 2),,, :, D3(), Vol. 68, No. 5, pp. 773 780, 2012. 3) MURAI, Y., ARIMURA, M., HASEGAWA, H., TAMURA, T. and KAJIYA, Y. : Text mining analysis on methods of information provision that influence tourists travel behavior, Journal of the Eastern Asia Society for Transportation Studies, Vol. 8, pp. 941 953, 2010. 4) HASEGAWA, H., FUJII, M., ARIMURA, M. and TAMURA, T. : A Basic Study on Traffic Accident Data Analysis Using Support VectorMachine, Journal of the Eastern Asia Society of Transportation Studies, Vol. 7, pp. 2873 2880, 2007. 5) Hasegawa, H., Arimura, M. and Tamura, T.: Hybrid Model of Random Forests and Genetic Algorithms for Commute Mode Choice Analysis, Proceedings of The Eastern Asia Society for Transportation Studies, Vol. 9, pp. 123 136, 2013. 6) ARIMURA, M., NAITO, T., HASEGAWA, H. and TAMURA, T.: APPLICATION OF DATA MINING TECH- NIQUES TO CONGESTION DATA ANALYSIS: THE CASE OF SAPPORO URBAN AREA, Selected Proceedings of World Conference on Transport Research, Vol. 12, 2010. 7) HASEGAWA, H., FUJII, M., ARIMURA, M. and TAMURA, T.: A Study on Traffic Accident Analysis Using Support Vector Machines, Proceedings of The 11th World Conference on Transportation Research, Vol. 11, World Conference on Transport Research Society, 2007. 8),, :,, Vol. 40, 2009. 9) Li, M., Zhang, Y. and Wang, W. : Analysis of congestion points based on probe car data, 2009 12th International IEEE Conference on Intelligent Transportation Systems, pp. 1 5, 2009. 10) Lu, Y. and Kawamura, K. : Data-Mining Approach to Work Trip Mode Choice Analysis in Chicago, Illinois, Area, Transportation Research Record: Journal of the Transportation Research Board, Vol. 2156, pp. 73 80, 2010. 11) Hossain, M. and Muromachi, Y. : Understanding Crash Mechanisms and Selecting Interventions to Mitigate Real- Time Hazards on Urban Expressways, Transportation Research Record: Journal of the Transportation Research Board, Vol. 2213, pp. 53 62, 2011. 12), : R, 2, 2007. 13) : The R Tips, 2, 2009. 14) : 22,, 2013. 15) :, Useful R,, 2015. 16) : (missing data analysis) --, 2011. 17),,, :, 1, 2004. 18),, :,, 2013. 19) Murrell, P.: R,, 2009. 72

A as.factor()............................. 12, 32 as.matrix()................................ 50 as.ordered()............................ 12, 32 C cfa().......................................20 colmaxs()................................. 50 colmeans()................................ 49 colmins().............................. 49, 50 complete.cases()........................... 16 CP932....................................... 5, 68 CSV.............................................5 D data cleaning................................... 15 data cleansing.................................. 15 data preprocessing..............................15 devtools.............................32 discretize().............................22, 24 dummy variable................................ 32 E Equal Frequency Discretization................. 22 Equal Width Discretization..................... 22 Excel............................................ 9 F FIML.......................................... 19 full information maximum likelihood............19 G ggplot2............................. 66 growth()...................................20 I infotheo......................... 22, 24 install.packages().......................... 71 install github()............................ 32 is.na()..................................... 16 K k-means..................................22, 29 kmeans().................................. 29 L lavaan()................................... 20 lavaan.............................. 20 lavcor()................................... 20 library()................................... 71 M makedummies()........................ 32, 33 makedummies.......................32 matrixstats..................... 49, 50 md.pattern().............................. 17 MI............................................. 19 mice()..................................... 19 mice.............................17, 19 missing value................................... 16 multiple imputation............................ 19 N NA.............................................. 7 na.omit().................................. 20 ncol()..................................... 64 normalize.......................................46 nrow().....................................64 numeric........................................ 12 P prcomp().................................. 35 principle component analysis................... 35 R R............................................ 1, 60 random sampling............................... 52 read.csv()................................ 5, 6 read.fwf()...................................5 read.table()................................. 5 read.xlsx().................................. 9 read csv().................................. 6 read fwf()...................................5 readr.............................. 5, 6 readworksheetfromfile()............... 9, 11 RODBC..............................4 S sample()...................................52 sampling....................................... 52 scale()..................................... 46 sem()......................................20 Shift-JIS.........................................5 SRI............................................ 19 standardization................................. 46 stochastic regression imputation................ 19 73

str()....................................... 11 subset()................................55, 58 summary().............................64, 65 T table().....................................65 U UTF-8......................................... 68 X XLConnect....................... 9, 71 xlsx.................................. 9.........................................9..................................... 19...............................19.............................32......................... 19, 20................................... 15................................ 7, 15, 16, 65.................................. 5....................................... 35............................................ 12..................................... 35................................... 12 II................................ 32.............................22...............................52................................... 19.................................. 7........................ 22.................... 22, 24................................ 9..................................... 52..........................................46....................................... 52....................................... 52............................... 18, 21................................... 18..........................................15................................... 52....................................... 32........................... 5............................. 18, 20..........................................46..................................... 52....................................... 35..................................... 19..................................... 32..................................... 18 74

様式 3 3 DATA PREPROCESSING AND CLEANING METHODS FOR UTILIZING BIG DATA IN TRANSPORTATION RESEARCH Hasegawa,H. 1 1 National Institute of Technology, Akita College In recent years, major improvements in information and communication technology have aided data collection in the transportation research area. The captured data needs to be converted into information and knowledge to become useful for decision-making. However, as data have grown in size and complexity, it is not satisfied only by traditional statistical methods. Searching for useful nuggets of information among huge amounts of data has become known as the field of data mining. Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, to data. Under these circumstances, purposes of this study are followings: 1. Surveying data preprocessing and cleaning methods. 2. Applying data preprocessing and cleaning methods for transportation related data sets. 3. In particular, making well-organized reference about data preprocessing and cleaning methods for transportation researchers. The following two results were obtained. 1. GNU R is adopted for applying data preprocessing and cleaning methods for transportation related data sets. GNU R is a open source language and development environment for statistical analysis. 2. A reference report about data preprocessing and cleaning methods for transportation researchers is available via JACIC s web site. KEYWORDS: data preprocessing, data cleaning, big data, GNU R, transportation.

様式 3 2 研究成果の要約 助成番号助成研究名研究者 所属 第 2014-11 号 交通関連ビッグデータ活用に向けたデータの前処理 クリーニングに関する研究 長谷川裕修 秋田工業高等専門学校 1. 研究の背景と目的近年, センサ技術の発達とデータ保存コストの低下を背景に, システムによって自動的に記録 蓄積されるデータ (= ビッグデータ ) の量的 質的な増加が加速している. これら増大し続ける交通関連ビッグデータからマーケティング 政策立案等における意志決定に有用な知識を発掘するためには, データマイニングによる知識発見プロセスが必要となる ( 図 -1). 図 -1 KDD プロセス このプロセスのうち, データセットから頻出するパターンやルールを発見するパターンの発見は狭義のデータマイニングとも呼ばれる中心的な過程であり, 高越分野においても多くの研究蓄積がある. 一方, パターン発見の前段階として行われるデータの前処理 クリーニングは, 分析の一過程として, 外れ値の削除や変数変換, セグメント毎のデータ分割などが行われているものの, いずれも探索的に行われているのが実情であり, 交通分野において体系だった整理は行われていない. また, それぞれの研究においてどのような前処理 クリーニングが行われているかについては, 紙幅の制約により詳しい説明は省略されることが多く, 科学における再現性 という観点からも問題がある. 以上を踏まえて本研究は, 交通系ビッグデータからの知識発見への応用を念頭に, 交通関連データの前処理 クリーニングの方法論を整理することを目的とするものである. 2. 研究手順具体的な研究手順としては, データマイニング 機械学習 情報処理 統計学分野におけるデータの前処理 クリーニング方法について資料収集を行い, それを元に, 交通関連データへの適用を, オープンソースのデータ解析環境 R およびその拡張パッケージを用いて行った. 3. 研究の新規性と成果の活用本研究の新規性は, これまで十分には整理されてこなかったデータの前処理 クリーニングについて, 適用対象にオープンデータとして公開されている, あるいは,Web 経由で簡単に入手可能な交通関連データを用いたことで, 交通分野の実務者 研究者にとって馴染みやすいものすることが出来た点にある. 交通関連データの持つ情報量の損失を抑えつつ, 扱いやすい形に変換するためには, 統計や機械学習の知識 技術と共に, 当該交通現象に関する領域知識が必要である. 本研究に対する読者からのフィードバックを得て, より一層, 資料を改善したいと考えている. データの前処理 クリーニングは分析精度に大きく影響するだけでなく, 増え続けるデータを現実的な計算時間で取り扱うためにも重要であることから, 本研究の成果が活用され, 今後の交通分野における実務 研究の発展に寄与することを期待している.