LDA (Latent Dirichlet Allocation) Wikipediade LDA 2 / 37

Similar documents
日本内科学会雑誌第97巻第7号

日本内科学会雑誌第98巻第4号

70 : 20 : A B (20 ) (30 ) 50 1

高校生の就職への数学II

) km 200 m ) ) ) ) ) ) ) kg kg ) 017 x y x 2 y 5x 5 y )

A(6, 13) B(1, 1) 65 y C 2 A(2, 1) B( 3, 2) C 66 x + 2y 1 = 0 2 A(1, 1) B(3, 0) P 67 3 A(3, 3) B(1, 2) C(4, 0) (1) ABC G (2) 3 A B C P 6

社会学部紀要 44☆/表紙(44)記念号(多い)

untitled

indb

BIT -2-

Taro13-芦北(改).jtd

Taro13-宇城(改).jtd

...J QX

レジャー産業と顧客満足の課題


Ł\”ƒ-2005

2 27 4,200 1,890 8,400 14, /

第90回日本感染症学会学術講演会抄録(I)

76 3 B m n AB P m n AP : PB = m : n A P B P AB m : n m < n n AB Q Q m A B AQ : QB = m : n (m n) m > n m n Q AB m : n A B Q P AB Q AB 3. 3 A(1) B(3) C(

2 2

日本内科学会雑誌第102巻第4号

> > <., vs. > x 2 x y = ax 2 + bx + c y = 0 2 ax 2 + bx + c = 0 y = 0 x ( x ) y = ax 2 + bx + c D = b 2 4ac (1) D > 0 x (2) D = 0 x (3

シンポジウム 報告書_5.PDF

(4) P θ P 3 P O O = θ OP = a n P n OP n = a n {a n } a = θ, a n = a n (n ) {a n } θ a n = ( ) n θ P n O = a a + a 3 + ( ) n a n a a + a 3 + ( ) n a n

さくらの個別指導 ( さくら教育研究所 ) A 2 P Q 3 R S T R S T P Q ( ) ( ) m n m n m n n n

1



-

東海水害報告書最終版.PDF

円借款案件事後評価報告書2000(全文版・第2巻)

熊本県数学問題正解

O1-1 O1-2 O1-3 O1-4 O1-5 O1-6

LAN Micro AVS LAN i

プログラム

放射線専門医認定試験(2009・20回)/HOHS‐05(基礎二次)



) 9 81

i IHE IHE-J HIS RIS PACS CT CT CT

北アルプス_燕岳~穂高_-2.doc

2016.

第121回関東連合産科婦人科学会総会・学術集会 プログラム・抄録


本文/目次(裏白)



プログラム

さくらの個別指導 ( さくら教育研究所 ) A AB A B A B A AB AB AB B

pdf

(1) θ a = 5(cm) θ c = 4(cm) b = 3(cm) (2) ABC A A BC AD 10cm BC B D C 99 (1) A B 10m O AOB 37 sin 37 = cos 37 = tan 37

Microsoft Word - 倫理 第40,43,45,46講 テキスト.docx


untitled

PDF変換用(報告書)帯広市新エネルギービジョ


2

さくらの個別指導 ( さくら教育研究所 ) A 2 2 Q ABC 2 1 BC AB, AC AB, BC AC 1 B BC AB = QR PQ = 1 2 AC AB = PR 3 PQ = 2 BC AC = QR PR = 1


/10/26(Tue) /10/26(Tue) ( ) 98 5 ksw

飯能市と名栗村の新しいまちづくり計画に関する住民意識調査

.o...w ren

waseda2010a-jukaiki1-main.dvi

橡早川ゼミ卒業論文 棟安.PDF

100号表紙

1. 4cm 16 cm 4cm 20cm 18 cm L λ(x)=ax [kg/m] A x 4cm A 4cm 12 cm h h Y 0 a G 0.38h a b x r(x) x y = 1 h 0.38h G b h X x r(x) 1 S(x) = πr(x) 2 a,b, h,π


P33W・P28X カタログ

表紙1-4

日歯雑誌(H19・5月号)済/P6‐16 クリニカル  柿木 5

192 No m 1. 5 m 9 mm t kw 38 m

News_Letter_No35(Ver.2).p65

歴史を育む町 菟田野

REALV5_A4…p_Ł\1_4A_OCF

untitled

「都市から地方への人材誘致・移住促進に関する調査」

<91498EE88CA D815B2E786C73>

Lecture on

〔 大 会 役 員 〕

橡本体資料+参考条文.PDF

untitled

30 Yamasaki Aki 1980 NHK NHK

学習内容と日常生活との関連性の研究-第2部-第6章

II (No.2) 2 4,.. (1) (cm) (2) (cm) , (

合併後の交付税について


( )

untitled

% 4.4% % 5.0% % 4.5% % 2.7% % 2.0% % 3.6% 5.1% 4.5% 2.6% 3.6%

労働法総論講義(2・完)

koji07-01.dvi

untitled

717A B G G 424A 51A A F 57

表紙4_1/山道 小川内 小川内 芦塚

OABC OA OC 4, OB, AOB BOC COA 60 OA a OB b OC c () AB AC () ABC D OD ABC OD OA + p AB + q AC p q () OABC 4 f(x) + x ( ), () y f(x) P l 4 () y f(x) l P

PSCHG000.PS

サクラの木につく毛虫の観察

™¹ficŒ«“O1

main.dvi

1. 2 P 2 (x, y) 2 x y (0, 0) R 2 = {(x, y) x, y R} x, y R P = (x, y) O = (0, 0) OP ( ) OP x x, y y ( ) x v = y ( ) x 2 1 v = P = (x, y) y ( x y ) 2 (x

Transcription:

Latent Dirichlet Allocation W707 s-taiji@is.titech.ac.jp 1 / 37

LDA (Latent Dirichlet Allocation) Wikipediade LDA 2 / 37

1 LDA: Latent Dirichlet Allocation 2 Wikipedia LDA 3 3 / 37

DF 4 / 37

DF 4 / 37

Bag of Words Bag of words: 100 please credit. money Bag of wards 5 / 37

Bag of Words ( ) ( 0). 1 2 3 N 1 4 8 0 2 2 2 0 2 1 3 2 4 0 0. ( ) 6 / 37

: n i x i : (x 1,..., x k ) Mult(β; n) P(x 1,..., x k π) = n! x 1!... x k! βx1 1... βx k k. β = (β 1,..., β M ) M i=1 β i = 1, β i 0 7 / 37

Dirichlet Dirichlet β ( ) β Diri(α) n i=1 p(π α) = Γ(α i) n Γ( n i=1 α β α i 1 i. i) α = (α 1,..., α n ) α i > 0 n β i=1 8 / 37

Dirichlet Wikipedia 9 / 37

: Dirichlet X = (X 1,..., X k ) x = (x 1,..., x k ) ( k i=1 x i = n) π Diri(α) 10 / 37

: Dirichlet X = (X 1,..., X k ) x = (x 1,..., x k ) ( k i=1 x i = n) π Diri(α) Dirichlet p(π x) = p(x π)p(π α) p(x π)p(π α)dπ (π x 1 1... πx k k }{{} ) (π α 1 1 1... π α 1 k k ) }{{} = π x1+α1 1 1... π x k +α k 1 k = Diri((x 1 + α 1,..., x k + α k )). = (α 1,..., α k ) = (x 1 + α 1,..., x k + α k ) 10 / 37

LDA: Latent Dirichlet Allocation : K, M, N β (k) = (β (k) 1,..., β(k) ) (k = 1,..., K) K N π (d) = (π (d) 1,..., π(d) K ) : (x (d) 1,..., x (d) M ) K M π (d) k Mult(β k ; n (d) ) }{{} k=1 k ( K = Mult π (d) k β k ; n (d)). k=1 n (d) d {π (d) } {β (k) } 11 / 37

LDA 12 / 37

: X = (x (d) 1,..., x (d) M )N d=1 (N M: ) x (d) i : i d ( p x (d) K ( ) N K p(x {β k } K k=1, {π (d) } N d=1) = p x (d) β (k) π (d) k. k=1 β(k) π (d) k. Mult d=1 k=1 ) x (d) ( K ) k=1 π(d) k β k ; n (d) 13 / 37

: ( ) N K p(x {β k } K k=1, {π (d) } N d=1) = p x (d) β k π (d) k. d=1 k=1 LDA {β k } K k=1 {π(d) } N d=1 Dirichlet π β β k π (d) LDA p({β k } K k=1, {π(d) } N d=1 X ). { ˆβ k } K k=1 {ˆπ (d) } N d=1 LDA 14 / 37

LDA 1 1 LDA 2 3 15 / 37

. Gibbs ( ) ( ) Collapsed Collapsed Gibbs 16 / 37

R LDA library(topicmodels) library(lda) X K > LDA(X, K) 17 / 37

1 LDA: Latent Dirichlet Allocation 2 Wikipedia LDA 3 18 / 37

2014 Wikipedia http://dumps.wikimedia.org/jawiki/20140624/ jawiki-20140624-pages-articles1.xml.bz2 19 / 37

Python Python Gensim (Bag-of-words, ) Python3.4.1 + Numpy1.8.1 + Scipy0.14.0, Windows 7, 64bit Gensim WikiCorpus JaWikiCorpus () python MeCab > python jawikicorpus_make.py jawiki-20140624-pages-articles1.xml.bz2 jawiki1 20 / 37

MeCab,,*,*,*,*,,,,,,*,*,*,,,,,*,*,,,,,,*,*,*,,,,,,*,*,*,,,,,,,*,*,*,*,,, 21 / 37

( ) 10 62158 aa 202 31510 aaa 132 10543 aab 65 15293 aac 48 25269 aaron 42 19714 ab 212 32430 aba 93 19037 abba 21 45622 abbey 24 19673 abc 706 10 0 1 2 3 4 EU ( ) 5 6 7 8 9 22 / 37

Matrix Market %%MatrixMarket matrix coordinate real general 59749 62999 4970557 1 867 2 1 1577 1 1 6045 1 1 9144 1 1 9393 1 1 10498 2 1 11234 3 1 11705 1 59,749 62,999 ( ) 4,970,557 23 / 37

LDA 20, Gibbs K = 20 wiki.lda <- LDA(ssx,K,method= Gibbs,control = list(burnin=2000,iter = 5000)) list(burnin=2000,iter = 5000) 5000 2000 24 / 37

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 [1,] " " "de" " " "windows" " " [2,] " " "la" "" "gt" "mm" [3,] " " " " " " "pc" " " [4,] " " "cc" " cd" "lt" "km" [5,] " " "file" "vol" " " " " [6,] " " " " " " "os" " " [7,] " " " " " " "ms" " " [8,] "" " " "" "ii" " " [9,] "" "" "one" "mhz" " " [10,] " " " " " " "mb" " " [11,] " " "le" " " "vs" " " [12,] " " " " " " "minus" "cm" [13,] " " " " " " "for" " " [14,] " " "image" " " "mac" " " [15,] " " " " "" "system" "dd" 25 / 37

Topic 6 Topic 7 Topic 8 Topic 9 [1,] " " " " "nbsp" " " [2,] " " "tbs" "km" " " [3,] " " "nhk" " " " " [4,] " " " " " " " " [5,] " " " " "" " " [6,] " " " " " " " " [7,] " " " " " " " " [8,] " " " " " " " " [9,] " " " " " " " " [10,] " " " " " " " " [11,] "" "" " " " " [12,] "" " " " " " " [13,] "" " " "" " " [14,] " " " " " " " " [15,] "" " " " " " " 26 / 37

Topic 10 Topic 11 Topic 12 Topic 13 Topic 14 [1,] "bs" " " " " " " "en" [2,] "hd" " " " " " " " " [3,] " " "" " " " " " " [4,] " " "op" " " " " " " [5,] " " " " "jr " " " "" [6,] "com" " " " " " " " " [7,] "sports" "" " " " " "right" [8,] " " " " " " " " " " [9,] "tv" " " " " " " " " [10,] "sup" "" "" " " " " [11,] " " " " "" " " " " [12,] " " "" " " "kg" " " [13,] " " " " " " " " "" [14,] " " " " "" " " "png" [15,] "" " " " " " " " " 27 / 37

Topic 15 Topic 16 Topic 17 Topic 18 [1,] "and" "km" "ch" " " [2,] "in" "text" " " " " [3,] "file" "style" " " " " [4,] "to" " " " " " " [5,] "university" " " " " " " [6,] "new" " " " " " " [7,] "on" "center" " " " " [8,] "by" "align" " " " " [9,] "with" "bar" " " " " [10,] "press" " " "kw" " " [11,] "for" " " " " " " [12,] "at" "" " " " " [13,] "en" " " " " " " [14,] "white" " " " " " " [15,] "black" "bull" "fm" " " 28 / 37

Topic 19 Topic 20 [1,] "th" " " [2,] "love" " " [3,] "live" " " [4,] "" " " [5,] "in" " " [6,] "dvd" " " [7,] "you" " " [8,] "cd" " " [9,] "best" "" [10,] "to" " " [11,] "music" " " [12,] "" " " [13,] "my" "cm" [14,] "on" " " [15,] "go" " " 29 / 37

Topic 3: Topic 4: Topic 5: Topic 6: Topic 18: 30 / 37

"Topic 3 :" " " "Topic 4 :" "Xeon PC-9821 ThinkCentre Safari Microsoft X68000 Unicode E0000-E0FFF MC68000.NET Framew "Topic 18 :" " ( ) ( ) 1 ( ) 4 " 31 / 37

wordcloud(vocs,freq) (library(wordcloud) ) vocs freq (0 1 ). freq 32 / 37

1 LDA: Latent Dirichlet Allocation 2 Wikipedia LDA 3 33 / 37

(gam) gam (For_report.zip ) mackgam.r mackgam.r lon,lat b.depth,c.dist UBRE (Unbiased Risk Estimator) UBRE mack.gamadd$gcv.ubre (UBRE GCV ) 34 / 37

(gam) ken-kankyo-kakou.csv (For_report.zip ) (4 ) (hclust) (Mclust) (hclust plot(hc) Mclust heatmap(result$z,col=redgreen(256)) ) 35 / 37

(LDA): (optional) LDA jawiki2 (jawiki2_short_bow.mm jawiki2_short_titles_tmp.txt jawiki2_short_wordids_tmp.txt ) 36 / 37

n R pdf ( tex ) 8/8( ) http://www.is.titech.ac.jp/~s-taiji/lecture/dataanalysis/dataanalysis.html 37 / 37