R 3 R 2017 Email: gito@eco.u-toyama.ac.jp October 23, 2017 (Toyama/NIHU) R ( 3 ) October 23, 2017 1 / 34
Agenda 1 2 3 4 R 5 RStudio (Toyama/NIHU) R ( 3 ) October 23, 2017 2 / 34
10/30 (Mon.) 12/11 (Mon.) New! 1/9 (Tue.) New! (Toyama/NIHU) R ( 3 ) October 23, 2017 3 / 34
(regression analysis) (OLS) (GLM) (inferential statistics) = ( ) ( ) ( ) (Toyama/NIHU) R ( 3 ) October 23, 2017 4 / 34
50% 30% Google (http://toyokeizai.net/articles/-/171160?display=b) 2017 R ( ) (Toyama/NIHU) R ( 3 ) October 23, 2017 5 / 34
(Toyama/NIHU) R ( 3 ) October 23, 2017 6 / 34
( ) ( ) ( ) (Toyama/NIHU) R ( 3 ) October 23, 2017 7 / 34
datum ( ) (1) (2) ( ( ) ) (https://kotobank.jp) ( ) ( ) ( ) GDP (Toyama/NIHU) R ( 3 ) October 23, 2017 8 / 34
(unit of observation) (unit of analysis) (variable) GDP, 2 GDP (constant) ( ) (Toyama/NIHU) R ( 3 ) October 23, 2017 9 / 34
4 1 (nominal scale): 2 ( ) 2 (ordinal scale): 2 2 1 2 ( ) 1 2 ( ) 3 (interval scale): ( ) 5 10 5 (2 ) (0 ) 4 (ratio scale): (0) (0 ) 50kg 100kg 50kg 2 ( ) > > > (Toyama/NIHU) R ( 3 ) October 23, 2017 10 / 34
(statistic) ( ) ( ) ( ) ( ) ( ) (Toyama/NIHU) R ( 3 ) October 23, 2017 11 / 34
(mean, average) n ( ) x = (x 1, x 2,..., x n ) x x = n i=1 n = (x 1 + x 2 + + x n ) n (1) (median) n x m m df(x) 1 2 and m df(x) 1 2 (2) m n 2 (Toyama/NIHU) R ( 3 ) October 23, 2017 12 / 34
(mode) ( ) x = (1, 1, 1, 1, 1, 2, 3, 4, 5, 6) 1 3 (outlier) (e.g., ) (e.g., ) (Toyama/NIHU) R ( 3 ) October 23, 2017 13 / 34
500, 100, n = 10, 000 ( x = 500, m = 500) Median Mean Frequency 200 400 600 800 10 4 100 ( x = 594, m = 501) Frequency 0 2000 4000 6000 8000 10000 100/10, 000 = 1/100 ( robust) (Toyama/NIHU) R ( 3 ) October 23, 2017 14 / 34
(IQR) (unbiased variance) n x = (x 1, x 2,..., x n ) x σ 2 x n σx 2 i=1 = (x i x) 2 n 1 (3) σ 2 x σ x (standard deviation, sd) ( ) (3) ( ) n 1 n ( ) ( ) x x (Toyama/NIHU) R ( 3 ) October 23, 2017 15 / 34
(IQR) (inter-quartile range, IQR) ( ) (1 ) n x = (x 1, x 2,..., x n ) x x 4 IQR 3 Q 3/4 (upper quartile) 1 Q 1/4 (lower quartile) Q 3/4 Q 1/4 m = Q 2/4 = Q 1/2 Q 0/4 Q 4/4 ( ) IQR 50% [Q 1/4 1.5IQR, Q 3/4 + 1.5IQR] (outlier) (box-and-whisker plot) ( ) q/10 q (Toyama/NIHU) R ( 3 ) October 23, 2017 16 / 34
(population) ( ) ( ) 2 (data generating process) (sample) (sampling) ( ) (statistical inference) ( ) (error) ( ) ( ) (Toyama/NIHU) R ( 3 ) October 23, 2017 17 / 34
(sample size): ( ) N (number of samples): ( ) 10 20 10 20 1,500 2,000 2 1,500 ( ) 2,000 ( ) (Toyama/NIHU) R ( 3 ) October 23, 2017 18 / 34
( ) (parameter) (parameter) (e.g., ) 1 2000 91.0% 2 5 2011 (http://www.asahi.com/edu/hiraku/hiraku2011/article01.html) ( ) ( standard error) 1 ( ) 2 (Toyama/NIHU) R ( 3 ) October 23, 2017 19 / 34
(Central Limit Theorem, CLT) ( ) n X 1, X 2,..., X n X n, σ 2 X X E[X] n Z n (X n E[X] ) 0, 1 ( ) N (0, 1) ( ) Z n = n(xn E[X]) n(xn E[X]) = (4) σ 2 X σ X X n E[X] N (0, σx 2 /n) ( ) n X n E[X] µ, σ ( σ 2 ) N (µ, σ 2 ) (Toyama/NIHU) R ( 3 ) October 23, 2017 20 / 34
( ) ( ) n X 1, X 2,..., X n X n, σx 2 X E[X] ( ) n 95% X n 1.96 σx 2 /n E[X] X n + 1.96 /n (5) X n N (E[X], σx 2 /n) 95% ( ) = σ 2 X (Toyama/NIHU) R ( 3 ) October 23, 2017 21 / 34
(5) E[X] 95% (confidence interval, CI) (standard error, SE): σx 2 /n = σ X/ n ( ) σx n ( n ) 95% CI [X n 1.96SE, X n + 1.96SE] 95% (X n E[X]) n ( ) t t 1.96 (Toyama/NIHU) R ( 3 ) October 23, 2017 22 / 34
( ) α% α (confidence coefficient) 95%, 90%, 99% ( 94%, 96%, etc. ) 5% 10% 1% p < 0.05, p < 0.1, p < 0.01 ( (Type I/α error) ( ) 5%, 10%, 1%) (Type I/α error) H 0 (e.g., ) H 0 (interval estimation) 100 ( ) 95% 95 95% 1 (point estimation) (Toyama/NIHU) R ( 3 ) October 23, 2017 23 / 34
SD = σ X, SE = σ X / n SD > SE n (n 2) n n ( ) n SE = σx/ n SE 95% [Xn 1.96SE, X n + 1.96SE] (Toyama/NIHU) R ( 3 ) October 23, 2017 24 / 34
α% 1 1 α% α% ( ) 100 ( ) 95% 95 95% 100 100 95% 100 95% 95 5 95% 0 1 (Toyama/NIHU) R ( 3 ) October 23, 2017 25 / 34
(file) path: URL PC URL http://cfes-project.eco.u-toyama.ac.jp/education/ education_2017/r_2017/ sample ( ) path /Users/Gaku/Desktop/sample sample.csv path /Users/Gaku/Desktop/sample.csv path OS (Win 10 ) Google!. sample.csv.csv, sample.xls.xls OS (Win 10 ) Google! (Toyama/NIHU) R ( 3 ) October 23, 2017 26 / 34
R (R path, encoding ) R /. R ( ) (1) (2) ( ) Mac Macintosh HD ( / ) Windows C (Toyama/NIHU) R ( 3 ) October 23, 2017 27 / 34
R R a ( ) ( ) a A R (Toyama/NIHU) R ( 3 ) October 23, 2017 28 / 34
R R A A A A A A R Google Error: object x not found (1) (2) (Toyama/NIHU) R ( 3 ) October 23, 2017 29 / 34
R (object) R R x 1 > x <- 1 + 1 <- ( = ) R ( ) ( ) vector, matrix, data.frame (tibble), list (e.g., ) 1 > x2 <- x/2 2 > x2 3 [1] 1 (Toyama/NIHU) R ( 3 ) October 23, 2017 30 / 34
R R double ( ), integer ( ), logical ( ), character ( ), factor ( ) 1 > x_num <- 1 + 1 2 > x_num 3 [1] 2 4 > x_chr <- "2" 5 > x_chr 6 [1] "2" 7 > class(x_num) 8 [1] "numeric" 9 > class(x_chr) 10 [1] "character" (Toyama/NIHU) R ( 3 ) October 23, 2017 31 / 34
R ( ) (5 8 ) x_chr 2 2 (9 10 ) 1 > num_vec <- c(1, 2, 3, 4, 5, 6) 2 > mean(num_vec) 3 [1] 3.5 4 > chr_vec <- c("1", "2", "3", "4", "5", "6") 5 > mean(chr_vec) 6 [1] NA 7 Warning message: 8 In mean.default(chr_vec) : argument is not numeric or logical: returning NA 9 > mean(as.numeric(chr_vec)) 10 [1] 3.5 (Toyama/NIHU) R ( 3 ) October 23, 2017 32 / 34
R 1 (URL: http://cfes-project.eco.u-toyama.ac.jp/education/education_ 2017/r_2017/rcode_fall2017/) 2 R 2. R R R (Toyama/NIHU) R ( 3 ) October 23, 2017 33 / 34
( ) (1 ) R R 1 3, 6 ( ) Gelman & Hill. Data analysis. Chap. 1 2 ( ) Stata 5 ( ) 1 2 ( ) R R (Toyama/NIHU) R ( 3 ) October 23, 2017 34 / 34