(2012) (2013) ( (2009) Angrist and Pischke (2009))



Similar documents
dvi

& 3 3 ' ' (., (Pixel), (Light Intensity) (Random Variable). (Joint Probability). V., V = {,,, V }. i x i x = (x, x,, x V ) T. x i i (State Variable),

X X X Y R Y R Y R MCAR MAR MNAR Figure 1: MCAR, MAR, MNAR Y R X 1.2 Missing At Random (MAR) MAR MCAR MCAR Y X X Y MCAR 2 1 R X Y Table 1 3 IQ MCAR Y I

A5 PDF.pwd

kubostat2015e p.2 how to specify Poisson regression model, a GLM GLM how to specify model, a GLM GLM logistic probability distribution Poisson distrib

Dirichlet process mixture Dirichlet process mixture 2 /40 MIRU2008 :

わが国企業による資金調達方法の選択問題

2016年2月27日11s感性工学会パンフレット

12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? ( :51 ) 2/ 71

untitled

kubostat2017e p.1 I 2017 (e) GLM logistic regression : : :02 1 N y count data or


80 X 1, X 2,, X n ( λ ) λ P(X = x) = f (x; λ) = λx e λ, x = 0, 1, 2, x! l(λ) = n f (x i ; λ) = i=1 i=1 n λ x i e λ i=1 x i! = λ n i=1 x i e nλ n i=1 x

Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI

雇用不安時代における女性の高学歴化と結婚タイミング-JGSSデータによる検証-

フィナンシャルレビュー 第80号

RTM RTM Risk terrain terrain RTM RTM 48

BB 報告書完成版_修正版) doc

03.Œk’ì

<30315F985F95B65F90B490852E696E6464>

untitled

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

yasi10.dvi

2017 (413812)

kubostat7f p GLM! logistic regression as usual? N? GLM GLM doesn t work! GLM!! probabilit distribution binomial distribution : : β + β x i link functi

26 Development of Learning Support System for Fixation of Basketball Shoot Form

/22 R MCMC R R MCMC? 3. Gibbs sampler : kubo/


framing 2 3 reframing 4 LRT LRT LRT LRT 5 2LRT LRT 2.1 LRT JR JR8.0km 45, JR LRT LRT JR3 5 7,000 6,000 5,000 4,000

理学療法検査技術習得に向けた客観的臨床能力試験(OSCE)の試行

IT,, i

日本統計学会誌, 第44巻, 第2号, 251頁-270頁

Vol. 29, No. 2, (2008) FDR Introduction of FDR and Comparisons of Multiple Testing Procedures that Control It Shin-ichi Matsuda Department of

日本統計学会誌, 第45巻, 第2号, 217頁-230頁

dvi

z.prn(Gray)


? (EM),, EM? (, 2004/ 2002) von Mises-Fisher ( 2004) HMM (MacKay 1997) LDA (Blei et al. 2001) PCFG ( 2004)... Variational Bayesian methods for Natural

第3章.DOC

カルマンフィルターによるベータ推定( )


258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

x T = (x 1,, x M ) x T x M K C 1,, C K 22 x w y 1: 2 2

放射線専門医認定試験(2009・20回)/HOHS‐01(基礎一次)

01.Œk’ì/“²fi¡*

kubostat2017c p (c) Poisson regression, a generalized linear model (GLM) : :

DSGE Dynamic Stochastic General Equilibrium Model DSGE 5 2 DSGE DSGE ω 0 < ω < 1 1 DSGE Blanchard and Kahn VAR 3 MCMC

: (EQS) /EQUATIONS V1 = 30*V F1 + E1; V2 = 25*V *F1 + E2; V3 = 16*V *F1 + E3; V4 = 10*V F2 + E4; V5 = 19*V99

<4D F736F F D20D2E5E7E8F1FB E3EEE45FE8F1EFF0>

116,/ / - /-, /1 /2 0 / ,, / 3 ing / , 2, 3, , 3,,. ISO. /.. 0 -,. ISO., 0-

,.,.,,.,,.,,,,,.,,,.,.,,,.,,.,,,,,,,.,,.,,.,,,,.,,,,,,.,,.,,.,.,,,,,,.,,,,.

,,.,.,,.,.,.,.,,.,..,,,, i

一般化線形 (混合) モデル (2) - ロジスティック回帰と GLMM

untitled

ばらつき抑制のための確率最適制御

fiš„v8.dvi


And Business

: (GLMM) (pseudo replication) ( ) ( ) & Markov Chain Monte Carlo (MCMC)? /30

AMR日本語版書式

Hi-Stat Discussion Paper Series No.248 東京圏における 1990 年代以降の住み替え行動 住宅需要実態調査 を用いた Mixed Logit 分析 小林庸平行武憲史 March 2008 Hitotsubashi University Research Unit

‰gficŒõ/’ÓŠ¹

udc-2.dvi

46−ª3�=4�“ƒ‚S“·‚Ö‡¦

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

1 Stata SEM LightStone 4 SEM 4.. Alan C. Acock, Discovering Structural Equation Modeling Using Stata, Revised Edition, Stata Press 3.

,.,. NP,., ,.,,.,.,,, (PCA)...,,. Tipping and Bishop (1999) PCA. (PPCA)., (Ilin and Raiko, 2010). PPCA EM., , tatsukaw

税制改正にともなう家計の所得弾性値 : 高齢者パネルデータによる実証分析

05_藤田先生_責

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

2 ( ) i

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

Power Transformation and Its Modifications Toshimitsu HAMASAKI, Tatsuya ISOMURA, Megu OHTAKI and Masashi GOTO Key words : identity transformation, pow


SOM SOM(Self-Organizing Maps) SOM SOM SOM SOM SOM SOM i

地域総合研究第40巻第1号

ERINA Report

66-1 田中健吾・松浦紗織.pwd

16_.....E...._.I.v2006

,,.,,.,..,.,,,.,, Aldous,.,,.,,.,,, NPO,,.,,,,,,.,,,,.,,,,..,,,,.,

JMP V4 による生存時間分析

光学

山形大学紀要

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

Abstract This paper concerns with a method of dynamic image cognition. Our image cognition method has two distinguished features. One is that the imag


1. Precise Determination of BaAl2O4 Cell and Certification of the Formation of Iron Bearing Solid Solution. By Hiroshi UCHIKAWA and Koichi TSUKIYAMA (

untitled

Preliminary Version Manning et al. (1986) Rand Health Insurance Experiment Manning et al. (1986) 3 Medicare Me

橡最終原稿.PDF

Vol. 36, Special Issue, S 3 S 18 (2015) PK Phase I Introduction to Pharmacokinetic Analysis Focus on Phase I Study 1 2 Kazuro Ikawa 1 and Jun Tanaka 2

Ishi

fiš„v5.dvi

揃 Lag [hour] Lag [day] 35


dvi

Grund.dvi

Stata 11 Stata ts (ARMA) ARCH/GARCH whitepaper mwp 3 mwp-083 arch ARCH 11 mwp-051 arch postestimation 27 mwp-056 arima ARMA 35 mwp-003 arima postestim

ON STRENGTH AND DEFORMATION OF REINFORCED CONCRETE SHEAR WALLS By Shigeru Mochizuki Concrete Journal, Vol. 18, No. 4, April 1980, pp. 1 `13 Synopsis A

Transcription:

43, 1, 2013 9 41 58 Web Joint Random Effect Modeling for Repeated Durations and Discrete Choices with Selection Bias Correction: Application to Promotion Policy Planning for Potential Clients Using Web Access-log Data Takahiro Hoshino 1 3000 1.5 EM E We point out that the current researches in Big data analytics neglect two important issues, consumer heterogeniety and selection bias, in constructing predivtion modeling. We provide a real example in which we must deal with the two issues properly, joint modeling of repeated duration and pruchase behavior. To be more concrete, we apply the joint modeling to a Web access-log dataset from a very large panel study. To plan a promotion policy for potential clients of a online shopping company, we proposed a propensity score weighted generalized EM algorithm of the proposed model, to adjust for covariate differences between potential clients and current clients. The proposed model incorporates random effects expressing unmeasured heterogeniety, which inevitably requires numerical integration. However in large dataset it is not practical to employ Markov chain Monte Carlo methods in random effect modeling. We applied the fully exponential Laplace approximation to the estimation algorithm of the proposed model, found that the algorithm is less computationally expensive, while it provides accurate estimates. : EM 464-8601 (E-mail: hoshino@soec.nagoyau.ac.jp)

42 43 1 2013 1. (2012) (2013) 2013 1 2 ( (2009) Angrist and Pischke (2009))

EM 43 1 3000 1.5 Markov Chain Monte Carlo MCMC MCMC MCMC MCMC (Jordan et al. (1999)) ( Wang and Titterington (2004) Braun and McAuliffe (2010)) 18 Tierney and Kadane (1986) 20 MCMC MCMC Rue et al. (2009) latent Gaussian model (integrated nested Laplace approximation: INLA) Fong et al. (2010) INLA Rizopoulos et al. (2009) Bianconcini and Cagnone (2012) (Tierney et al. (1989)) (2012)

44 43 1 2013 1 * EM Web 1 z = 1 1 z = 0 (1) (2) z = 1

EM 45 (z = 0) (z = 0) EM 2 3 E EM 4 5 2. 2.1 (recurrent event duration analysis, Seethraman and Chintagunta (2003) Bijwaard et al. (2006)) (recurrent event survival analysis) Web Key Performance Indicator; KPI KPI KPI yij D i j j + 1 j + 1 1 0 yij B i J i ( j = 1,..., J i ) i x i i j j + 1 w ij i f i f i f i N(0, φ) y D ij yld ij = log y D ij

46 43 1 2013 f ( [ yij LD ) 1 y LD f i, w ij = σ exp ij (β 0 + f i + w t ij β w) σ exp ( y LD ij (β 0 + f i + w t ij β w) σ )] (2.1) (Klein and Moeschberger (2003)) β 0 + f i + w t ij β w σ j + 1 y B ij f i logit [ p ( y B ij = 1 f i, w ij )] = α0 + α f f i + w t ijα w (2.2) J i p(y i1,..., y iji w i1,..., w iji ) = p ( yij LD ) ( ) f i, w ij p y B ij f i, w ij p(f i )df i (2.3) j=1 y ij = (yij LD, yij B)t α f 2.2 1 z i = 0 z i i 1 0 z i = 1 y i1,..., y iji z i = 0 x y w (Missing at random) N i=1 z i (1 w(x i )) w(x i ) log p(y i1,..., y iji w i1,..., w iji ) (2.4)

EM 47 z = 0 p(y w, z = 0) ( (2005) Hoshino et al. (2006) Wooldridge (2007) Pan and Schaubel (2009)) w(x i ) x i z i = 1 y (z = 0) w y x y y w x 4 Web 1 z = 0 y w x (z = 1) p(x z = 0) w(x i ) = p(x i z i = 1)p(z i = 1) p(x i z i = 1)p(z i = 1) + p(x i z i = 0)p(z i = 0) (2.5) w(x i ) 2.3 Vaida and Xu (2000) clustered data EM Rizopoulos et al. (2009) 1 Web 2 3. EM α = (α 0, α f, α t w) t β = (σ, β 0, β t w) t θ = (α t, β t, φ) t

48 43 1 2013 S(θ) = N i=1 N i=1 z i (1 w(x i )) S i (θ) w(x i ) z i (1 w(x i )) w(x i ) log g(fi, y i, w i θ) p(f i y θ i, w i, θ)df i (3.1) S i (θ) = θ log p(y i1,..., y iji w i1,..., w iji ) (3.2) J i g (f i, y i, w i θ) = p ( yij LD ) ( ) f i, w ij p y B ij f i, w ij p(f i φ) (3.3) j=1 p(f i y i, w i, θ) = g(f i, y i, w i θ) g(fi, y i, w i θ)df i θ log g(f i, y i, w i θ) = α + β J i j=1 J i j=1 log p ( y B ij f i, w ij ) log p ( yij LD ) f i, w ij + φ log p(f i φ) (Tierney et al. (1989)) S i (θ) r ( ) log g ˆfi, y i, w i θ Ŝ ir (θ) = 1 θ r 2 γ ir (3.4) O(J 2 i ) (Rizopoulos et al. (2009)) θ r θ r { γ ir = Σ 1 Σi i Σ 1 i f i Σ i = 2 f 2 i 2 Σ i} θ r log g(f i, y f i θ i, w i θ) (3.5) r f i= ˆf i log g(f i, y i, w i, z i θ) (3.6) fi = ˆf i EM (1)

EM 49 (2) 1 (3.3) f i ˆf i log g(f i, y i, w i θ) 1 2 3 (3) (3.1) (4) θ ˆθ (5) (2) (4) EM MCMC S(θ) w(x i ) 1 w ij 2 1 0.7 1 1 N = 5,000 100,000 2 J i J i (1) 10 3 17 (2) 50 5 95 2 4 1 MCMC MCMC iteration 3000 1000 Burn-in phase Geweke 1 MCMC

50 43 1 2013 J i 10 50 MCMC MCMC SAS/IML Window7 64bit Intel Core i7-3930k (6 12way/3.20GHz/3.80GHz/12MB) 32GB N = 100,000 J i 50 10.2 MCMC iteration 3000 233.1 (4 18 23 ) MCMC MCMC 3000 iteration MCMC 4. 13000 URL URL URL web 2011 9708 7116 Random Digit Dialing; RDD 1 12 1 A URL

EM 51 URL 30 2010 4 2012 2 A 3 2 (z = 1) B 2 A 3 2 (z = 0) 196 2258 x (5 ) (9 ) (6 ) ( 3 ) (13 ) w ij 24 4 Web 3 (yahoo google ) blog SNS (twitter facebook mixi) ( ) 8 URL A URL A (z = 1) B )

52 43 1 2013 2 Web Web (z = 1) (z = 0) A (z = 1) 4483 (z = 0) 1397 8.3 2 31 31 4 (1) (2.4) EM (2) EM (3)

EM 53 (4) EM 4 2 (4) (4) (3) α f φ 2 (1) (2) (3) (1) (4) 5% (1) (3) (4) (2) (3) α w Yahoo! (4) A A (1) (4) google (2) (3) Yahoo! Yahoo! A Yahoo! google Web 10% ROI (Return on investment) (4) 2 3

54 43 1 2013 2 Web 4 (1) (2) (4) (3) σ 1.387 0.01039 1.389 0.00723 1.476 0.00695 1.364 0.00971 β0 0.229 0.00873 0.234 0.00721 0.261 0.00583 0.233 0.01028 12 18 0.165 0.03624 0.084 0.02140 0.053 0.01889 0.178 0.04865 18 22 0.089 0.04009 0.102 0.02509 0.179 0.02489 0.092 0.04803 22 6 0.137 0.04093 0.088 0.03305 0.050 0.03107 0.152 0.04848 Yahoo! 0.028 0.04510 0.141 0.03984 0.111 0.03791 0.049 0.06824 google 0.299 0.07304 0.093 0.06985 0.101 0.06208 0.367 0.08607 βw 0.109 0.10028 0.034 0.07709 0.060 0.08094 0.130 0.11276 blog 0.038 0.06904 0.110 0.07090 0.381 0.05033 0.093 0.06687 SNS 0.091 0.06002 0.009 0.05809 0.109 0.05034 0.088 0.07328 0.093 0.08709 0.034 0.04095 0.051 0.03097 0.121 0.08797 0.039 0.05001 0.019 0.04098 0.054 0.03887 0.071 0.07321 α0 3.869 0.29503 3.739 0.15600 3.593 0.13978 4.361 0.40943 α f 0.321 0.11010 0.208 0.06982 0.409 0.14132 12 18 0.110 0.07093 0.050 0.04499 0.069 0.04109 0.108 0.09834 18 22 0.051 0.04094 0.065 0.02499 0.041 0.02098 0.039 0.07169 22 6 0.210 0.05097 0.049 0.06097 0.110 0.05570 0.261 0.06095 Yahoo! 0.019 0.03069 0.109 0.01610 0.098 0.01609 0.048 0.02950 google 0.081 0.02483 0.051 0.03329 0.053 0.03192 0.083 0.03082 αw 0.060 0.06094 0.107 0.04508 0.050 0.04604 0.101 0.09989 blog 0.101 0.05098 0.030 0.03097 0.060 0.02806 0.101 0.08426 SNS 0.179 0.06093 0.061 0.05093 0.110 0.04330 0.210 0.08272 0.041 0.07083 0.149 0.03780 0.101 0.03512 0.098 0.04133 0.080 0.06095 0.008 0.04397 0.076 0.03799 0.051 0.07075 φ 0.832 0.05938 0.790 0.03921 0.889 0.08983 (4) 2 0.28059 0.71366 1.01456 5%

EM 55 3 * z ( (2009)) c 0.7331 z 5. Web MCMC (transfer learning) ( (2010)) (Shimodaira (2000) Sugiyama et al. (2007)) (Rosenbaum and Rubin (1983))

56 43 1 2013 2 z z f (Follman and Wu (1995) (2009)) MCMC ( Hjort et al. (2010)) (Hoshino, in press) MCMC URL A A (4) 2 URL

EM 57 ( ) (A)23680026 A. θ 1 2 3 [ [ ] log g(f i, y f i, w i, z i θ) = f Ji i i φ 1 σ + 1 y LD σ exp ij β t w ij σ j=1 ( ) ] α f y B ij p ij [ [ Σ i = 1 J i φ 1 y LD σ 2 exp ij J i Σ i = f i j=1 [ j=1 [ 1 y LD σ 3 exp ij β t w ij σ β t w ij σ ] ] + α 2 f p ij (1 p ij ) + α 3 f p ij (1 p ij )(1 2p ij ) ] ] (A.1) (A.2) (A.3) w ij = (1, f i, w t ij )t p ij = p(y B ij = 1 f i, x i, w ij ) Σ/ θ β β Σ i = J i j=1 [ 1 y LD ij σ 3 w ij exp β t w ij σ ] (A.4) α p ij αp ij (1 p ij ) Angrist, J. and Pischke, J. S. (2009). Mostly Harmless Econometrics: An Empiricist s Companion, Princeton University Press, London. Bianconcini, S. and Cagnone, S. (2012). Estimation of generalized linear latent variable models via fully exponential Laplace approximation, J. Multivar. Anal., 112, 183 193. Bijwaard, G. E., Franses, P. H. and Paap, R. (2006). Modeling purchases as repeated events, J. Bus. Econ. Stat., 24, 487 502. Braun, M. and McAuliffe, J. (2010). Variational inference for large-scale models of discrete choice, J. Am. Stat. Assoc., 105, 324 335.

58 43 1 2013 Follman, D. and Wu, M. (1995). An approximate generalized linear model with random effects for informative missing data, Biometrics, 51, 151 168. Fong, Y., Rue, H. and Wakefield, J. (2010). Bayesian inference for generalized linear mixed model, Biostatistics, 11, 397 412. Hjort, N. L., Holmes, C., Müller, O. and Walker, S. G. (2010). Bayesian Nonparametrics, Cambridge University Press, Cambridge. (2005). M 32 2, 121 132. (2009).. Hoshino, T. (in press). Semiparametric Bayesian estimation for marginal parametric potential outcome modeling: Application to causal inference, J. Am. Stat. Assoc. Hoshino, T., Kurata, H. and Shigemasu, K. (2006). A propensity score adjustment for multiple group structural equation modeling, Psychometrika, 71, 691 712. (2013). AI 28 1. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L. (1999). An introduction to variational methods for graphical models, Mach. Learn., 37, 183 233. (2010). 25 4. Klein, J. P. and Moeschberger, M. L. (2003) Survival Analysis: Techniques for Censored and Truncated Data, 2nd ed., Springer, New York. (2012). 60 1, 173 188. Pan, Q. and Schaubel, D. E. (2009). Evaluating bias correction in weighted proportional hazard regression, Lifetime Data Analysis, 15, 120 146. Rizopoulos, D., Verbeke, G. and Lesaffre, E. (2009). Fully exponential laplace approximations for the joint modeling of survival and longitudinal data, J. R. Stat. Soc., Ser. B, 71, 637 654. Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41 55. Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested laplace approximations (with discussion), J. R. Stat. Soc., Ser. B, 71, 319 392. Seethraman, P. B. and Chintagunta, P. K. (2003). The proportional hazard model for purchase timing: A comparison of alternative specifications, J. Bus. Econ. Stat., 21, 368 382. Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function, J. Stat. Plan. Inf., 90, 227 244. (2012).. Sugiyama, M., Krauledat, M. and Müller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation, J. Mach. Learn. Res., 8, 985 1005. Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities, J. Am. Stat. Assoc., 81, 82 86. Tierney, L., Kass, R. and Kadanae, J. B. (1989). Fully exponential Laplace approximations to expectations and variances of nonpositive functions, J. Am. Stat. Assoc., 84, 710 716. Vaida, F. and Xu, R. (2000). Proportional hazards models with random effects, Stat. Med., 19, 3309 3324. Wang, B. and Titterington, D. M. (2004). Lack of consistency of mean field and variational Bayes approximations for state space models, Neural Process. Lett., 20, 151 170. Wooldridge, J. M. (2007) Inverse probability weighted M-estimation for general missing data problems, J. Econom., 141, 1281 1301.