R 2012/10/05 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 1 / 9
Why we use... 3 5 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 2 / 9
FA vs categorical FA, 1 2 3 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 3 / 9
One of the reasons 3 7 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 4 / 9
(2002) 1 2 polychoric correlation coefficient (polyserial correlation coefficient) 3 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 5 / 9
Polychoric Correlation Polyserial Correlation Tetrachoric Correlation Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 6 / 9
images of latent continuity Figure : image of latent continuity and expression x ξ x ξ x = 1 ξ < a 1 x = 2 a 1 ξ < a 2 x = 3 a 2 ξ < a 3. x = s. a s 1 ξ (1) Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 7 / 9
ρ X,Y (2step-ML) Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 8 / 9
Follow me with R code... Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 9 / 9
> library(psych) > library(polycor) > # sample statistics > sample <- read.csv("cefasample.csv",head=f,na.strings="*") > head(sample) V1 V2 V3 V4 V5 V6 V7 V8 1 1 1 1 1 4 1 1 1 2 3 4 4 1 4 4 1 1 3 3 4 4 3 4 3 3 4 4 2 4 5 2 2 4 1 4 5 2 2 2 3 4 2 2 3 6 3 3 5 3 3 2 2 3 > summary(sample) V1 V2 V3 V4 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 1st Qu.:3.500 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:3.000 Median :4.000 Median :4.000 Median :4.000 Median :4.000 Mean :3.913 Mean :4.127 Mean :3.901 Mean :3.853 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000 NA's :2 NA's :1 V5 V6 V7 V8 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:2.00 1st Qu.:3.000 Median :4.000 Median :3.000 Median :3.00 Median :4.000 Mean :3.955 Mean :3.138 Mean :2.78 Mean :3.442 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.00 3rd Qu.:4.000 Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000 NA's :1 > table(sample$v1) 1 2 3 4 5 2 26 61 178 88 > describe(sample) var n mean sd median trimmed mad min max range skew kurtosis se V1 1 355 3.91 0.87 4 4.00 0.00 1 5 4-0.70 0.22 0.05 V2 2 355 4.13 0.78 4 4.22 0.00 1 5 4-1.11 2.08 0.04 V3 3 353 3.90 0.78 4 3.95 0.00 1 5 4-0.76 0.96 0.04 V4 4 354 3.85 0.90 4 3.94 0.00 1 5 4-0.82 0.66 0.05 V5 5 355 3.95 0.87 4 4.04 1.48 1 5 4-0.71 0.24 0.05 V6 6 355 3.14 0.95 3 3.16 1.48 1 5 4-0.22-0.12 0.05 V7 7 354 2.78 1.01 3 2.79 1.48 1 5 4 0.14-0.70 0.05 V8 8 355 3.44 1.00 4 3.47 1.48 1 5 4-0.42-0.28 0.05 1
> # peason cor > peason.cor <- cor(sample,use="complete.obs") > print(peason.cor,digit=2) V1 V2 V3 V4 V5 V6 V7 V8 V1 1.00 0.380 0.43 0.40 0.26 0.19 0.285 0.26 V2 0.38 1.000 0.28 0.34 0.27 0.16 0.099 0.21 V3 0.43 0.277 1.00 0.26 0.21 0.15 0.150 0.16 V4 0.40 0.339 0.26 1.00 0.42 0.26 0.276 0.23 V5 0.26 0.265 0.21 0.42 1.00 0.23 0.255 0.22 V6 0.19 0.157 0.15 0.26 0.23 1.00 0.341 0.39 V7 0.29 0.099 0.15 0.28 0.26 0.34 1.000 0.41 V8 0.26 0.212 0.16 0.23 0.22 0.39 0.415 1.00 > # polychoric cor > polychoric.cor <- polychoric(sample) > print(polychoric.cor$rho) V1 V2 V3 V4 V5 V6 V7 V1 1.0000000 0.4693292 0.4993862 0.4702445 0.3260640 0.2015360 0.3172379 V2 0.4693292 1.0000000 0.3661174 0.4283065 0.3544777 0.1925806 0.1164603 V3 0.4993862 0.3661174 1.0000000 0.3131351 0.2971062 0.1704954 0.1565841 V4 0.4702445 0.4283065 0.3131351 1.0000000 0.5128292 0.2805638 0.3020316 V5 0.3260640 0.3544777 0.2971062 0.5128292 1.0000000 0.2612329 0.2785856 V6 0.2015360 0.1925806 0.1704954 0.2805638 0.2612329 1.0000000 0.3832876 V7 0.3172379 0.1164603 0.1565841 0.3020316 0.2785856 0.3832876 1.0000000 V8 0.2939444 0.2544516 0.1885443 0.2562720 0.2513339 0.4138156 0.4444297 V8 V1 0.2939444 V2 0.2544516 V3 0.1885443 V4 0.2562720 V5 0.2513339 V6 0.4138156 V7 0.4444297 V8 1.0000000 > # > # compare, peason vs polycor > # > > # FA > fa.parallel(peason.cor,n.obs=355) Parallel analysis suggests that the number of factors = 3 and the number of components = 2
> fa.parallel(polychoric.cor$rho,n.obs=355) Parallel analysis suggests that the number of factors = 3 and the number of components = > fa.result.peason <- fa(peason.cor,n.obs=355,fm="gls",nfactors=3,rotate="promax") > fa.result.polych <- fa(polychoric.cor$rho,n.obs=355,fm="gls",nfactors=3,rotate="promax") > print(fa.result.peason,digit=3,sort=t) Factor Analysis using method = gls Call: fa(r = peason.cor, nfactors = 3, n.obs = 355, rotate = "promax", fm = "gls") Standardized loadings (pattern matrix) based upon correlation matrix item GLS2 GLS1 GLS3 h2 u2 V8 8 0.695 0.073-0.076 0.468 0.532 V7 7 0.583 0.032 0.028 0.377 0.623 V6 6 0.529-0.062 0.121 0.333 0.667 V1 1 0.056 0.886-0.091 0.721 0.279 V3 3-0.006 0.453 0.083 0.261 0.739 V4 4-0.015 0.017 0.696 0.490 0.510 V5 5 0.023-0.145 0.692 0.377 0.623 V2 2-0.063 0.289 0.313 0.273 0.727 GLS2 GLS1 GLS3 SS loadings 1.140 1.094 1.065 Proportion Var 0.143 0.137 0.133 Cumulative Var 0.143 0.279 0.412 Proportion Explained 0.346 0.332 0.323 Cumulative Proportion 0.346 0.677 1.000 With factor correlations of GLS2 GLS1 GLS3 GLS2 1.000 0.409 0.566 GLS1 0.409 1.000 0.688 GLS3 0.566 0.688 1.000 Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the null model are 28 and the objective function was 1.469 wit The degrees of freedom for the model are 7 and the objective function was 0.03 The root mean square of the residuals (RMSR) is 0.014 The df corrected root mean square of the residuals is 0.04 The number of observations was 355 with Chi Square = 10.559 with prob < 0.159 Tucker Lewis Index of factoring reliability = 0.9706 RMSEA index = 0.0388 and the 90 % confidence intervals are NA 0.0814 BIC = -30.546 3
Fit based upon off diagonal values = 0.995 Measures of factor score adequacy GLS2 GLS1 GLS3 Correlation of scores with factors 0.830 0.885 0.851 Multiple R square of scores with factors 0.689 0.783 0.723 Minimum correlation of possible factor scores 0.378 0.565 0.447 > print(fa.result.polych,digit=3,sort=t) Factor Analysis using method = gls Call: fa(r = polychoric.cor$rho, nfactors = 3, n.obs = 355, rotate = "promax", fm = "gls") Standardized loadings (pattern matrix) based upon correlation matrix item GLS3 GLS1 GLS2 h2 u2 V5 5 0.806-0.179 0.029 0.497 0.503 V4 4 0.709 0.024 0.020 0.543 0.457 V2 2 0.383 0.312-0.069 0.376 0.624 V1 1-0.138 0.976 0.069 0.826 0.174 V3 3 0.145 0.470-0.028 0.326 0.674 V7 7-0.019 0.052 0.657 0.447 0.553 V8 8-0.038 0.097 0.650 0.452 0.548 V6 6 0.143-0.083 0.555 0.365 0.635 GLS3 GLS1 GLS2 SS loadings 1.319 1.289 1.226 Proportion Var 0.165 0.161 0.153 Cumulative Var 0.165 0.326 0.479 Proportion Explained 0.344 0.336 0.320 Cumulative Proportion 0.344 0.680 1.000 With factor correlations of GLS3 GLS1 GLS2 GLS3 1.000 0.716 0.522 GLS1 0.716 1.000 0.392 GLS2 0.522 0.392 1.000 Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the null model are 28 and the objective function was 1.986 wit The degrees of freedom for the model are 7 and the objective function was 0.055 The root mean square of the residuals (RMSR) is 0.017 The df corrected root mean square of the residuals is 0.048 The number of observations was 355 with Chi Square = 19.207 with prob < 0.00756 Tucker Lewis Index of factoring reliability = 0.9265 4
RMSEA index = 0.0711 and the 90 % confidence intervals are 0.0335 0.1085 BIC = -21.897 Fit based upon off diagonal values = 0.995 Measures of factor score adequacy GLS3 GLS1 GLS2 Correlation of scores with factors 0.882 0.928 0.839 Multiple R square of scores with factors 0.779 0.862 0.704 Minimum correlation of possible factor scores 0.557 0.724 0.408 > # > # sample <- subset(sample,select=c("v11","v13","v20","v5","v4","v17","v12","v15")) > # write.table(sample,"cefasample.csv",sep=",",row.name=f,col.name=f,na="*") > > > # mixed pattern > sample.cat <- data.frame(lapply(sample[1:3],factor),sample[4:8]) > summary(sample.cat) V1 V2 V3 V4 V5 V6 1: 2 1: 3 1 : 2 Min. :1.000 Min. :1.000 Min. :1.000 2: 26 2: 13 2 : 18 1st Qu.:3.000 1st Qu.:4.000 1st Qu.:3.000 3: 61 3: 31 3 : 60 Median :4.000 Median :4.000 Median :3.000 4:178 4:197 4 :206 Mean :3.853 Mean :3.955 Mean :3.138 5: 88 5:111 5 : 67 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 NA's: 2 Max. :5.000 Max. :5.000 Max. :5.000 NA's :1 V7 V8 Min. :1.00 Min. :1.000 1st Qu.:2.00 1st Qu.:3.000 Median :3.00 Median :4.000 Mean :2.78 Mean :3.442 3rd Qu.:4.00 3rd Qu.:4.000 Max. :5.00 Max. :5.000 NA's :1 > hetcor.cor <- hetcor(sample.cat) > hetcor.cor$correlations V1 V2 V3 V4 V5 V6 V7 V1 1.0000000 0.4766232 0.4902862 0.4305458 0.2853987 0.2076320 0.3015123 V2 0.4766232 1.0000000 0.3740222 0.3757560 0.3093574 0.1839428 0.1175596 V3 0.4902862 0.3740222 1.0000000 0.2752806 0.2491626 0.1686548 0.1583849 V4 0.4305458 0.3757560 0.2752806 1.0000000 0.4202661 0.2636989 0.2758351 V5 0.2853987 0.3093574 0.2491626 0.4202661 1.0000000 0.2279503 0.2550014 V6 0.2076320 0.1839428 0.1686548 0.2636989 0.2279503 1.0000000 0.3414939 V7 0.3015123 0.1175596 0.1583849 0.2758351 0.2550014 0.3414939 1.0000000 V8 0.2663878 0.2378540 0.1553257 0.2324400 0.2175612 0.3937855 0.4146257 5
V8 V1 0.2663878 V2 0.2378540 V3 0.1553257 V4 0.2324400 V5 0.2175612 V6 0.3937855 V7 0.4146257 V8 1.0000000 > hetcor.cor$type [,1] [,2] [,3] [,4] [,5] [1,] "" "Polychoric" "Polychoric" "Polyserial" "Polyserial" [2,] "Polychoric" "" "Polychoric" "Polyserial" "Polyserial" [3,] "Polychoric" "Polychoric" "" "Polyserial" "Polyserial" [4,] "Polyserial" "Polyserial" "Polyserial" "" "Pearson" [5,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "" [6,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "Pearson" [7,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "Pearson" [8,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "Pearson" [,6] [,7] [,8] [1,] "Polyserial" "Polyserial" "Polyserial" [2,] "Polyserial" "Polyserial" "Polyserial" [3,] "Polyserial" "Polyserial" "Polyserial" [4,] "Pearson" "Pearson" "Pearson" [5,] "Pearson" "Pearson" "Pearson" [6,] "" "Pearson" "Pearson" [7,] "Pearson" "" "Pearson" [8,] "Pearson" "Pearson" "" > fa.parallel(hetcor.cor$correlations,n.obs=355) Parallel analysis suggests that the number of factors = 3 and the number of components = > fa.result.hetcor <- fa(hetcor.cor$correlations,n.obs=355,fm="gls",nfactors=3,rotate="proma > print(fa.result.hetcor,digit=3,sort=t) Factor Analysis using method = gls Call: fa(r = hetcor.cor$correlations, nfactors = 3, n.obs = 355, rotate = "promax", fm = "gls") Standardized loadings (pattern matrix) based upon correlation matrix item GLS1 GLS2 GLS3 h2 u2 V1 1 0.868 0.082-0.101 0.695 0.305 V3 3 0.599-0.029 0.017 0.359 0.641 V2 2 0.459-0.058 0.235 0.384 0.616 V8 8 0.077 0.686-0.075 0.460 0.540 6
V7 7 0.020 0.597 0.034 0.391 0.609 V6 6-0.031 0.520 0.109 0.328 0.672 V5 5-0.131-0.004 0.735 0.420 0.580 V4 4 0.078 0.014 0.613 0.459 0.541 GLS1 GLS2 GLS3 SS loadings 1.364 1.144 0.988 Proportion Var 0.171 0.143 0.123 Cumulative Var 0.171 0.314 0.437 Proportion Explained 0.390 0.327 0.283 Cumulative Proportion 0.390 0.717 1.000 With factor correlations of GLS1 GLS2 GLS3 GLS1 1.000 0.409 0.704 GLS2 0.409 1.000 0.552 GLS3 0.704 0.552 1.000 Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the null model are 28 and the objective function was 1.705 wit The degrees of freedom for the model are 7 and the objective function was 0.046 The root mean square of the residuals (RMSR) is 0.016 The df corrected root mean square of the residuals is 0.046 The number of observations was 355 with Chi Square = 16.046 with prob < 0.0247 Tucker Lewis Index of factoring reliability = 0.9361 RMSEA index = 0.0613 and the 90 % confidence intervals are 0.0203 0.0998 BIC = -25.059 Fit based upon off diagonal values = 0.994 Measures of factor score adequacy GLS1 GLS2 GLS3 Correlation of scores with factors 0.892 0.829 0.849 Multiple R square of scores with factors 0.795 0.688 0.721 Minimum correlation of possible factor scores 0.590 0.375 0.441 > 7