Dirichlet Process : joint work with: Max Welling (UC Irvine), Yee Whye Teh (UCL, Gatsby) http://kenichi.kurihara.googlepages.com/miru_workshop.pdf 1 /40 MIRU2008 :
Dirichlet process mixture Dirichlet process mixture 2 /40 MIRU2008 :
? 3 /40 MIRU2008 :
Non-parametric Bayesian Model for Spectral Clustering etric Bayes, spectral clustering, probabilistic model Abstract dy the problem of searching for the r of clusters, k, in clustering. In clustering applications, spectral clushas achieved great success. Follows success, we consider an extension of l clustering based on a non-parametric an approach which gives an elegant sofor model selection, i.e. choosing k. In linkage analysis lar, we use the Dirichlet process (DP). t propose a generative model for specstering to apply the DP. We then show relaxed greedy maximum likelihood esn for the model is in fact equivalent to l clustering. Based on the generative we derive a non-parametric Bayesian for spectral clustering using the DP. mental results show that the proposed k-means spectral clustering 4 /40 MIRU2008 : Figure 3: Typical results for BKM for various values of τ = 0.1, 0.5, 2. The tru DP Gaussian mixture Proposed algorithm to ten. In these 30 plots, BKM found eight, nine and 30ten clusters respectively. Som too small or too 15 large to be visible. 0-15 Dirichlet Process Mixture -30 9000-30 -15 0 15 30 8000 Figure 7000 1. Typical clustering results by the Dirichlet process Gaussian 6000 mixture model (left) and the proposed algorithm 4000 3000-30 -30-15 0 15 30 (right). 5000 The former discovered 10 Gaussians, and the latter correctly discovered 3 clusters. 2000 clustering algorithms even when the distribution of 1000 data can not be captured by usual distributions, e.g. Gaussian SSSSSSSSSSSSSSSSNSSSSSNSNNNNNNNNNNNSNNNSNSNNNNNNNN or multinomial. Largely speaking, there are 15 0-15
( ) ( ) Dirichlet process EM-like MCMC 5 /40 MIRU2008 :
Dirichlet process mixture Dirichlet process mixture 6 /40 MIRU2008 :
7 /40 MIRU2008 :
7 /40 MIRU2008 :
? z θ xi X: zi = 1 xk θ1 2 zk = 2 xj xl zj = 1 1 θ2 zl = 2 8 /40 MIRU2008 :
(Z, θ) = argmax log p(x Z,θ) xi X: zi = 1 xk θ1 2 zk = 2 xj xl zj = 1 1 θ2 zl = 2 9 /40 MIRU2008 :
(MCMC) 10/40 MIRU2008 :
Z=argmax p(x,z,θ) θ=argmax p(x,z,θ) q(z) p(x,z,θ) θ=argmax Eq(Z)[ log p(x,z,θ)] q(z) exp Eq(θ)[ log p(x,z,θ)] q(θ) exp Eq(Z)[ log p(x,z,θ)] 11/40 MIRU2008 :
Markov Chain Monte Carlo p(z X) Metropolis-Hastings Gibbs sampler ( ) %&! %&"' %&" %&#' %&# %&$' %&$ %&%' %!!!"!#!$ % $ # "! 12/40 MIRU2008 :
Dirichlet process mixture Dirichlet process mixture 13/40 MIRU2008 :
(Z, θ) = argmax log p(x Z,θ) iterated conditional mode Z=argmax log p(x Z,θ) θ=argmax log p(x Z,θ) θ1 zi = 1 zk = 2 zj = 1 1 θ2 zl = 2 14/40 MIRU2008 :
log p(x Z, θ) = 1 2 n x i θ zi 2 + constant i=1 z i = argmax log p(x, Z, θ) = argmax 1 2 x i θ zi 2 θ j = argmax log p(x, Z, θ) = 1 n j i;z i =j x i θ1 zj = 1 1 zi = 1 θ2 zk = 2 zl = 2 15/40 MIRU2008 :
: (Z, θ) = argmax p(x Z,θ) iterated conditoinal mode k-means z i = argmax x i θ zi 2 θ j = 1 x i n j i;z i =j θ1 zj = 1 1 zi = 1 θ2 zk = 2 zl = 2 16/40 MIRU2008 :
: 3 : 17/40 MIRU2008 :
K? K K Dirichlet process mixture K 20 0!20!40!60!60!40!20 0 20 20 0!20!40!60!60!40!20 0 20 20 0!20!40 K=3 K=4 K=5!60!60!40!20 0 20 18/40 MIRU2008 :
Dirichlet process mixture Dirichlet process mixture 19/40 MIRU2008 :
Dirichlet Process Mixture K Dirichlet Process Mixture (DPM) K [Ferguson 73, Antoniak 74] K? 20/40 MIRU2008 :
21 K? ( ) e.g. (DPM) /40 MIRU2008 :
DPM p(z,k) (=p(z)) 3 x1, x2, x3 : p(x Z) ( :) e.g. (DPM) 22/40 MIRU2008 :
DPM 23/40 MIRU2008 :
DPM 23/40 MIRU2008 :
DPM 23/40 MIRU2008 :
DPM 23/40 MIRU2008 :
Chinese Restaurant Process Dirichlet process 1 4 2 6 7... 10 3 8 9 5 p(z N [z 1...z N 1 ]) = N N c α+n 1 α α+n 1 (Nc N > 0; Z N is an existing cluster.) (Nc N = 0; Z N is a new cluster.) 24/40 MIRU2008 :
DPM? e.g. (DPM) Dirichlet process mixture (DPM) Dirichlet p(z K) + Poisson p(k) etc. DPM (MCMC) Dirichlet + Poisson 25/40 MIRU2008 :
DPM consistency x1, x2, x3 consistency p(z2) + p(z5) = p([(x1,x2)]) p(z2)=α/(1+α)(2+α); p(z5) = 2/(1+α)(2+α) p([(x1,x2)]) = 1/(1+α) 26/40 MIRU2008 :
K DPM K DPM p(z,k) DPM DPM consistency 27/40 MIRU2008 :
note: 1 2 3 4 5 K 1 2 3 4 5 K 28/40 MIRU2008 :
Dirichlet process mixture Dirichlet process mixture 29/40 MIRU2008 :
iterated conditional mode EM MCMC Gibbs sampler 30/40 MIRU2008 :
iterated conditional mode EM MCMC Gibbs sampler 31/40 MIRU2008 :
Markov Chain Monte Carlo p(z X) Metropolis-Hastings Gibbs sampler ( ) %&! %&"' %&" %&#' %&# %&$' %&$ %&%' %!!!"!#!$ % $ # "! 32/40 MIRU2008 :
Gibbs Sampler ( ) Z = (z1, z2,..., zn) zi p(zi Z-i, X) p(z X) p(zi Z-i, X) Metroplis-Hastings zi p(zi Z-i, X) p(x,z) (DPM OK ) 33/40 MIRU2008 :
: θ={µ,σ}: µ Σ µ Σ Wishart θ 34/40 MIRU2008 :
DPM Gibbs Sampler 1. p(x Z,θ) e.g. 2. p(θ) p(z) DPM 3. p(x,z) = dθ p(x,z,θ) ( ) 4. zi p(zi Z-i, X) p(x,z) 35/40 MIRU2008 :
MNIST 88 Gibbs sampler (GS) 36/40 MIRU2008 :
!" #" $%" )!" #" $%" )!('#!&!('#!&!&'% Latent Dirichlet!&'! Allocation!&'&!&'#!* Gibbs sampler +,-./,!&'! "!""" #""" $%"""!&'%!&'%!&'!!&'!!&'(!&'#!&'(!#!&'# "!""" #""" $%""" *+,-.+!# "!""" #""" $%"""!&'! *+,-.+ GHDP 100 GHDP 1 GLDA CVHDP CVLDA VLDA!&'&!&'( 37/40 MIRU2008 :
DPM? K (DPM ) DPM p(z,k) DPM DPM consistent Gibbs sampler non-dpm () 38/40 MIRU2008 :
DPM hierarchical Dirichlet process HDP-LDA, HDP-HMM, HDP-PCFG 39/40 MIRU2008 :
Dirichlet process mixture Dirichlet process mixture 40/40 MIRU2008 :