! daichi@ism.ac.jp 2012-3-15 ( ),
Overview ( ) Dirichlet Dirichlet Dirichlet Chinese Restaurant Process (CRP) Pitman-Yor n
? S NP VP VP V PP VP V NP PP P NP NP DET N NP N NP N He V S VP saw VP NP N her P with PP NP a telescope
(2) He saw her with a telescope He saw her with a hat?!!! cf.
1990 :! Web :!! 2 0.92 0.85 0.61 1 0.37 1.0
! Markov ( )!!! etc,etc
( ) 1 2 3 K K (K-1) (Simplex)! (0,0,1) (1,0,0) (0,1,0)
: Uniform :! :
(2)! ( )
! (1=1,2=3,3=2 )!? :!
(2) n k! 1 2 k K : ( )
! GMM HMM : Dirichlet ( Dirichlet )
Dirichlet process?!. DP (Ferguson 1973):! A stochastic process P is said to be a Dirichlet process on with parameter if for any measurable partition of, the random vector has a Dirichlet distribution with parameter.??
(2) :
Chinese Restaurant Process (CRP) (Dirichlet), (DP) /! (rich-gets-richer) CRP 1 2 3 4 : 2 3 1 0?
!! c(w) 0 w! p(w) is going to, united states of america!
n n! ( Markov ) n :! Google!
(MacKay 1994) n! Newton : n(w h) 0!! 0.1 0.001
Kneser-Ney (Kneser,Ney 1995) n(w h) D h Pitman-Yor! (Goldwater+ 2006, Teh 2006) Pitman-Yor?
(HDP) 2-3- DP DP DP 1- Suffix Tree ( ) n! (= Markov ) n (n-1) n DP (n+1)
Pitman-Yor Pitman-Yor (Pitman and Yor 1997): PY(,d), Poisson-Dirichlet d CRP 1 2 3 4 2 3 1 0?
Pitman-Yor (1) n (n-1)! Pitman-Yor Uniform,
Pitman-Yor (2) 2- PY 3- PY PY 1- CRP n h!! CRP!!! ( ) (n-1) sing a song! a song! song
CRP (n-1) 2! ( ) america butter
HPYLM HPYLM (hierarchical Pitman-Yor language model)! Gibbs sampling: For each w = randperm(all counts in the corpus), w w : seating arrangements america butter
HPYLM ( ) h w h Kneser-Ney O(log(n))
! % echo, mecab -O wakati, (, ) Chasen, MeCab (NAIST) (supervised learning)!
(2) # S-ID:950117245-006 KNP:99/12/27 * 0 5D * * * * * * * * 1 5D * * * * * * * 2 3D * * * * * * * 3 4D * * * 1995 38,400 ( ) ( )??
(3) קפיטליזםהיאשיטהכלכליתוחברתיתשהתפתחהבאירופהביןהמאההששעשרהוהמ אההתשעעשרה,המבוססתבעיקרהעלהזכותשלפרטיםוקבוצותלבעלותפרטיתעלה רכושולשימושבובאופןחופשי,תוךהסתמכותעלאכיפתזכויותהקנייןבאמצעותהרשו תהשופטת.!
:
:
: (Isfahan university of technology, Iran)
:!! : p( ) > p( ) 50! 2^50=1,125,899,906,842,624 ( )
: n p( ) = p( ^) p( ) p( ) p( ) (Shannon 1948) : Pitman-Yor (HPYLM) (Teh 2006; Goldwater+ 2005) Pitman-Yor : (GEM )
: HPYLM n-gram Pitman-Yor (PY) : PY: 0 Markov?
HPYLM: PY : V PY n-gram= HPYLM ( )
NPYLM: - HPYLM HPYLM-HPYLM Markov HPYLM, 1 ( 1/6879)
NPYLM : : :!! MCMC.
Blocked Gibbs Sampling p(x,z)!! (Blocked Gibbs sampler)
Blocked Gibbs Sampler for NPYLM!!!.... :! 0. For s=s_1 s_x do! parse_trivial(s, ).! 1. For j = 1..M do For s=randperm(s_1 s_x) do! words(s)! words(s) p(w s, )! words(s)! done.
Gibbs Sampling 1 2 10 50 100 200
words(s) p(w s, ) : s Forward-Backward (Viterbi ) Forward : t k! t-k t-k+1 t Y X Y k Y j
= k! EOS! k!
( ) t-k-1-j-1-i t-k-1-j-1 t-k-1 t Forward! : t k! j! ;_;! ( )
NPYLM as a Semi-Markov model BOS EOS Semi-Markov HMM (Murphy 02, Ostendorf 96)! +MCMC (n )
: SIGHAN Bakeoff 2005! 4 : 37,400 : 1000 ( ) : : MSR, : CITYU : 50,000 : 2
(F ) NPY(2),NPY(3) NPYLM or + NPY( ) NPY(3) 2 : ZK08 (Zhao&Kit 2008)! ZK08
10 55 17 HDP(Goldwater+ ACL 2006):! 1 2 ( ) NPYLM: 3 -
( ) NPYLM
NPYLM
Arabic Gigawords 40,000 (Arabic AFP news) الفلسطينيبسببتظاهرةلانصارحركةالمقاومةالاسلاميةحماس واذاتحققذلكفانكيسلوفسكييكونقدحازثلاثجوائزكبرىفيابرزثلاثة المحليةوالدوليةللحصولعلىلوازمهمالصحية Google translate: لايتمتعبلقب+رئيس+بلهو+قائد مايسمىب+السلطةالفلسطينية Filstinebsbptazahrplansarhrkpalmquaompalaslamiphamas اعلنتشرطةجنوبافريقيااليومالاثنينانمالايقل. يخي وقداستغرقاعدادهخمسةاعوام وقالتدانييلتومسونالتيكتبتالسيناريو NPYLM الفلسطيني بسبب تظاهرة ل انصار حركة المقاومة الاسلامية حماس تحقق ذلك ف ان كيسلوفسكي يكون قد حاز ثلاث جوائز كبرىفيابرز ثلاثة المحليةو الدولية للحصولعلى translate: لوازم هم الصحية Google تع ب لقب + Islamic رئيس + the بof ل هو + because قائد + event مايسمىof the ب + السلطة Palestinian supporters الفلسطينية اعلن ت شرطة Hamas. جنوبافريقي ا Movement, اليوم الاثنين انمالايقل Resistance وقد استغرق اعداد ه خمسةاعوام. و قال ت دان ييل تومسون التي " تاريخي
Alice in Wonderland first,shedreamedoflittlealiceherself,andonceagainthetinyhandswereclaspedup onherknee,andthebrighteagereyeswerelookingupintohersshecouldhearthevery tonesofhervoice,andseethatqueerlittletossofherheadtokeepbackthewanderingh airthatwouldalwaysgetintohereyesandstillasshelistened,orseemedtolisten,thew holeplacearoundherbecamealivethestrangecreaturesofherlittlesister'sdream.the longgrassrustledatherfeetasthewhiterabbithurriedbythefrightenedmousesplashe dhiswaythroughtheneighbouringpoolshecouldheartherattleoftheteacupsasthema rchhareandhisfriendssharedtheirneverendingmeal,andtheshrillvoiceofthequeen first, she dream ed of little alice herself,and once again the tiny hand s were clasped upon her knee,and the bright eager eyes were looking up into hers -- shecould hearthe very tone s of her voice, and see that queer little toss of herhead to keep back the wandering hair that would always get into hereyes -- and still as she listened, or seemed to listen, thewhole place a round her became alive the strange creatures of her little sister 'sdream. thelong grass rustled ather feet as thewhitera bbit hurried by -- the frightened mouse splashed his way through the neighbour ing pool -- shecould hearthe rattle ofthe tea cups as the marchhare and his friends shared their never -endingme a l,and the
( )! Gibbs! MATLAB R C++&C, 6000 100 200 / (10ms/ ) 1 n 40000! : 10 20
n - n! MCMC
!! (,, ) (,, ) (IRM, Mondrian process, )
! HDP-LDA Gibbs (Welling+, NIPS 2007-! 2008) Loglinear CRF Forward-Backward POS Tagging: CRF+HMM (, + 2007)