(missing data analysis) - - 1/16/2011 (missing data, missing value) (list-wise deletion) (pair-wise deletion) (full information maximum likelihood method, FIML) (multiple imputation method) 1 missing completely at random (MCAR) missing at random (MAR) missing not at random (MNAR) FIML (auxiliary variable) Enders (2010) 1 MCAR, MAR MNAR 3 (Rubin, 1976) 1 1.1 Missing Completely At Random (MCAR) MCAR 1 Y X R 0 1 R X Y MCAR e-mail: murakou@orion.ocn.ne.jp 1 SEM HLM 1
X X X Y R Y R Y R MCAR MAR MNAR Figure 1: MCAR, MAR, MNAR Y R X 1.2 Missing At Random (MAR) MAR MCAR MCAR Y X X Y MCAR 2 1 R X Y Table 1 3 IQ MCAR Y IQ X) IQ MAR IQ 1 Y R MAR X IQ IQ IQ R Y MAR MAR X Y IQ IQ MAR 1 Y R X MAR 2 FIML 3 R 2
IQ MAR IQ Y MAR MAR Table 1: MAR id IQ 1 3 83 n/a 93 2 4 85 n/a 99 3 5 95 n/a 98 4 2 96 n/a 103 5 5 103 128 128 6 3 104 102 102 7 2 109 111 111 8 6 112 113 113 9 3 115 117 117 10 3 116 133 133 3.6 101.8 117.3 111.7 1.3 Missing Not At Random (MNAR) MNAR 1 X Y R 1.4 (auxiliary variable) MAR MNAR MAR MAR 2 MAR (auxiliary variable) inclusive analysis strategy (Enders, 2010; Rubin, 1996; Schafer & Graham, 2002) MAR FIML MAR 3
(Enders, 2008) FIML R Y MAR MAR A X X Y R Y R MNAR MAR Figure 2: inclusive analysis strategy A (auxiliary variables) A MNAR MAR 1.5 MCAR MAR Table 1 117.3 111.7 Rubin (full information maximum likelihood method; FIML) MCAR MAR MAR FIML MCAR FIML MCAR FIML MCAR FIML 4
MNAR FIML Heckman (1979) selection model GHlynn, Laird & Rubin (1986) pattern mixture model MNAR MAR (e.g., Schafer & Graham, 2002) MAR FIML MNAR MAR MAR-based FIML (Schafer, 2003, p. 30) MAR MCAR 4 MCAR MAR FIML 2 (full maximum likelihood method; FIML) FIML 2.1 FIML p x p 1 1 ( ) f(x µ, Σ) = exp 1 (2π) p/2 1/2 2 Σ (x µ) Σ 1 (x µ) µ Σ µ Σ 1 x 1 µ Σ 5 (1) 1 ( ) f(x 1 µ, Σ) = exp 1 (2π) p/2 1/2 2 Σ (x 1 µ) Σ 1 (x 1 µ) (2) 4 Little (1988) MCAR test 5 5
2 3... i x i 1 ( ) f(x i µ, Σ) = exp 1 (2π) p/2 1/2 2 Σ (x i µ) Σ 1 (x i µ) (3) x 1 x i i = 1, 2,..., N f(x 1, x 2,..., x N µ, Σ) = N 1 ( ) exp 1 (2π) p/2 1/2 2 Σ (x i µ) Σ 1 (x i µ) i=1 (4) µ Σ µ Σ µ Σ µ Σ (4) µ Σ (4) µ Σ (4) µ Σ (likelihood function) L(µ, Σ) (3) x i µ Σ L i (µ, Σ) N N log L(µ, Σ) = log L i (µ, Σ) = log L i (µ, Σ) (5) i=1 i=1 (5) (4) µ Σ µ Σ (SEM) Σ imply 2.2 i 1 1 ( ) L i (µ, Σ) = f(x i µ, Σ) = exp 1 (2π) p/2 1/2 2 Σ (x i µ) Σ 1 (x i µ) (6) 6
x i 3 x i 3 1 µ 3 1 Σ 3 3 Table 1 id = 5 x 5 = 5 103 148, µ = µ 1 µ 2 µ 3, Σ = σ 2 1 σ 12 σ 13 σ 21 σ 2 2 σ 23 σ 31 σ 32 σ 2 3 (7) µ Σ IQ 2 id = 1 3 2 (6) x i 2 1 µ 2 1 Σ 2 2 x 1 = ( 3 83 ), µ = ( µ 1 µ 2 ), Σ = ( σ 2 1 σ 12 σ 21 σ 2 2 ) (8) = 3 IQ = 83 µ Σ (4) (5) FIML 6 FIML MAR Table 1 = 3.6 IQ = 101.8 = 110.9 7 (111.7) 8 IQ IQ IQ IQ FIML IQ IQ IQ 117.3 IQ 6 FIML 7 Mplus 8 IQ 117.2 7
IQ MAR FIML IQ IQ (borrow the information) MAR Enders (2010) µ Σ SEM Σ 9 2.3 FIML (e.g., AMOS, LISREL, EQS, Mplus, and Mx) FIML SAS mixed model 2.4 (inclusive analysis strategy) MAR IQ MAR IQ IQ (auxiliary variables) SEM (Enders, 2008) Figure 3 9 FIML N 100 1 N=1 χ 2 χ 2 = (N 1) log L(ˆθ) (9) FIML L 0 L 1 χ 2 χ 2 = 2(log L 1 log L 0 ) (10) 8
χ 2 χ 2 X1 Y e X2 A1 A2 Figure 3: (auxiliary variables) A1 A2 Enders (2008) Figure 4 e e e X1 X2 F1 F2 A1 A2 Y1 e Y1 e Figure 4: A1 A2 Enders (2008) inclusive analysis strategy FIML (SEM) Mplus VARIABLE auxiliary = 9
CFI TLI incremental fit index CFI TLI (independent model) 10 CFI TLI Enders (2010) 3 (multiple imputation method) FIML (imputation method) stochastic regression imputation stochastic regression imputation MAR stochastic regression imputation stochastic regression imputation MAR Rubin (1987) stochastic regression imputation 10 CFI TLI 1 (see Wu, West, & Taylor, 2009) 10
Figure 5 3 (imputation step) (regression, anova, sem, etc...) straightforward (posterior step) 推定値 1 と標準誤差 1 目的となる統計的分析 推定値 2 と標準誤差 2 単一の 推定値と 標準誤差 推定値 N と標準誤差 N 欠損値のある オリジナル データセット N 個の擬似完全 データセットの 作成 N セットの推定 値と標準誤差 代入ステップ 分析ステップ 統合ステップ Figure 5: (multiple imputation method) 3.1 (imputation step) (data augmentation method) SAS proc MI NORM 11 SPSS multiple imputation module sequential regression approach (or chained equations approach) van Buuren (2007) (posterior predictive distribution) 11 Schafer (1997) http://www.stat.psu.edu/ jls/misoftwa.html 11
(Markov chain monte carlo; MCMC) 1. 2. 3. stochastic regression model 4. 5. 6. 2-5 3. 5 3. p(y t µ t 1, σ t 1, Y obs ) (11) Y obs Y t Y t t µt 1, σ t 1 * µ 0 σ0 1. 2 5. 3. µ t 1,σ t 1, σ t 1, Y obs ) p(y t µ t 1 Y t 4. 5. p(µ t Y t, σ t 1, Y obs ) (12) µ µ t p(σ t Y t, µ t, Y obs ) (13) 12
σ σ t 12 µ t σt (11) 2-3 Y µ 0, σ 0, Y 1, µ 1, σ 0, Y 2, µ 2, σ 2, Y 3,... (14) Y µ σ Y Y (burn-in) Y 200 Y 201 Y 13 sequential (Gibbs sampler) variant 1 14 autocorrelation function plot MCMC 2008 3.2 (posterior/integration step) ANOVA, SEM, etc...) 12 13 14 13
3.2.1 ˆθ t t m θ = 1 m m ˆθ t (15) N t=1 3.2.2 V W = 1 m SEt 2 (16) m t=1 SE t t within-imputation variance V B = 1 m 1 m (ˆθ t θ) (17) t=1 between-imputation variance V W V B ANOVA V T = V W + V B + V B m (18) SE = V T (19) V W V B V B V B stochastic regression V B (18) V B /m 14
3.3 3.3.1 3-5 (e.g., Rubin, 1987) Graham, Olchowski, & Gilreath (2007) Enders (2010) 20 3.3.2 MAR 1 (nested data, hierarchical data) (hierarchical linear model, HLM) 15 3.3.3 Rounding imputed value 2.33 rounding Allison (2002) 15 Norm variant (http://www.stat.psu.edu/ jls/misoftwa.html) Mplus version 6 15
3.3.4 10 1 9 Enders (2010) duplicate-scale imputation X 2 1 X 1 X 8 1 1 X 2 8 2 X 1 X 2 6 X 1 X 2 X 1 X 2 X 1 X 1 X 2 X 2 X 1 X 2 X 1 duplication-scale imputation X 1 X 2 Little et al. (2008) 16 three-step approach duplicate-scale method 3.4 SAS SPSS SPSS sequential regression model Schafer 16 http://www.crmda.ku.edu/pdf/11. Imputation with Large Data Sets.pdf 16
(1997) Norm 17 SEM HLM SAS Norm 18 4 MAR SEM FIML FIML Mplus 5 Allison, P. D. (2002). Missing data., Newbury Park, CA: Sage. Enders, C. K. (2008). A note on the use of missing auxiliary variables in FIML-based structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 15, 434-448. Enders, C.K. (2010). Applied missing data analysis. New York: Guilford. Glynn, R. J., Laird, N. M., & Rubin, D. B. (1986). Selection modeling versus mixture modeling with nonignorable nonresponse. In H. Weiner (Ed.), Drawing inferences from self-selected samples (pp. 116-142). Berlin Springer-Verlag. Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206-213. Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5, 475-492. Little, R. J. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Associatin, 83, 1198-1202. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. Rubin, D.B. (1996). Multiple imputation after 18+ years (with discussion). Journal of the American Statistical Association, 91, 473-489. Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall, London. Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. 17 http://www.stat.psu.edu/ jls/misoftwa.html 18 Mplus version 6 17
(2008). van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 219-242. Wu, W., West, S. G., & Taylor, A. B. (2009). Evaluating model fit for growth curve models: Integration of fit indices from SEM and MLM frameworks. Psychological Methods, 14, 183-201. 18