( 50 2011 6 12 ) MeCab 1 1.1 1 MeCab( ) 21 ( ) MeCab MeCab 1.2 2 sugaiy@kindai.ac.jp 1 2010 2011 ( (B)) CALL ( ) 2 (2004) (2006) 1
Microsoft Windows EUC-KR 3 II( ) II. ( 1993:83) II (1) /MAG /NNG+ /JC /VA+ /ETM /NNG+ /JKB /VA+ /ETM /NNG+ /XSN+ /JKS /NNG+ /JKO /VV+ /EC /NNG+ /JKB /NNG+ /XSV+ /EC /VX+ /ETM /NNB. /VA+ /EF+./SF MACH ( 2004) HAM ( KLP )( 2002;2003) POSTAG SEJONG/K 4 KRISTAL Morphological Analyzer 5 6 3 2010 DVD Windows 4 http://isoft.postech.ac.kr/research/postag/sejong/postag sejong k.php 5 http://www.kristalinfo.com/k-lab/ma/ 6 asaokitan (http://d.hatena.ne.jp/asanote/20090319/1237452770) 2
KAIST (1997) 7 ( ) ( ) ( ) ( ) ( ) ( ) ChaSen( ) (2000) ( 2000:30) (2008, 2009) Web MeCab (2008) 15 (2010) (2011) (2010) MeCab (2011) 2 MeCab 2.1 MeCab MeCab 7 7 0.98 Windows Unix http://mecab.sourceforge.net/ 3
MeCab 1 1: MeCab Windows IPA 8 (2) 1 2 3 1 * ( ) MeCab IPA (3) x,679,679,7325,,,*,*,,,,,,683,683,7745,,,*,*,,,,,,681,681,7299,,,*,*,,,,,,677,677,7123,,,*,*,,,,, 8 IPA ChaSen( ) MeCab IPA http://sourceforge.jp/projects/ipadic/ MeCab (2.7.0-20070801) 392,126 4
2.2 MeCab MeCab MeCab MeCab UTF-8 15 MeCab MeCab MeCab MeCab MeCab MeCab ChaKi( ) 9 ChaKi ChaSen MeCab KWIC MeCab Perl Ruby MeCab Perl MeCab 9 http://sourceforge.jp/projects/chaki/ 5
3 MeCab MeCab Seed Seed 10 11 (2010) (2011) 12 MeCab 13 3.1 Seed 3.1.1 Seed Seed ( ) ( ) 2003 Excel 14 Excel 3 15 10 10 MeCab ( 1997:62 63) MeCab (http://www.mwsoft.jp/programming/munou/mecab nitteretou.html) 11 (2011:42-46) 12 (1997) 13 1. 14 (2003 6 4 ) 3 1 982 2 2,111 3 2,872 5,965 (2003) 15 Excel 6
40 Seed 10 5,095 4 778 3.1.2 Seed (4) 1 2 3 1 2 1 1: 1 (Noun) (Verb) (Adjective) (Siteisi) (Sonzaisi) (Adverb) (Ending) (Suffix) (Prefix) (Conjunction) (Interjection) 2 etc. 2 MeCab 1 ( ) (2003 1 20 ) 7
16 (5) goqbu,0,0,0,noun,,*,*,*, 01,, gos,0,0,0,noun,,*,*,*, 01,,* dal,0,0,0,noun,,,*,*, 05,,* dyl,0,0,0,noun,,*,*,*,,,* (6) jo ioq,0,0,0,noun,,*,*,*,,,* 3 (7) pal,0,0,0,noun,,,*,*, 03,, han,0,0,0,noun,, -,*,*, 01,,* 3 (8) seijoqdai oaq,0,0,0,noun,,,*,*,,, namsan,0,0,0,noun,,,*,*,,, 2 3 16 3 ID ID Seed 0 (3) IPA 8
(9) al,0,0,0,verb,,r, 1,*,,,* a,0,0,0,verb,,r -, 1,*,,,* al,0,0,0,verb,,r, 2,*,,,* a,0,0,0,verb,,r -, 2,*,,,* al a,0,0,0,verb,,r, 3,*,,,* 2 (10) sip,0,0,0,adjective,,*, 1,*,,,* sip y,0,0,0,adjective,,*, 2,*,,,* sip e,0,0,0,adjective,,*, 3,*,,,* (11) a. i,0,0,0,siteisi,,*, 1,*,,,* i,0,0,0,siteisi,,*, 2,*,,,* i ei,0,0,0,siteisi,,, 3,*,,,* iei,0,0,0,siteisi,,, 3,*,,,* ie,0,0,0,siteisi,,, 3,*,,,* i e,0,0,0,siteisi,,, 3,*,,,* b. iss,0,0,0,sonzaisi,,*, 1,*, 01,,* iss y,0,0,0,sonzaisi,,*, 2,*, 01,,* iss e,0,0,0,sonzaisi,,*, 3,*, 01,,* 2 3 (12) gyniaq,0,0,0,adverb,,*,*,*,,,* mos,0,0,0,adverb, -,*,*,*, 04,,* nai il,0,0,0,adverb,,,*,*,,, 9
(13) bunmieqhi,0,0,0,adverb,,*,*,*,,, - 3.1.1 1 Ending 2 3 (14) a. ga,0,0,0,ending,,,*,*,,,* n,0,0,0,ending,,,*,*,,, / b. l,0,0,0,ending,,,*,2,,,*,* da,0,0,0,ending,,,*,1,,,*,* se,0,0,0,ending,,,*,3,,,*,* dai,0,0,0,ending,,,*,1,,,*,- II I III I (15) a. si,0,0,0,suffix,,*, 1,*,,,* si,0,0,0,suffix,,*, 2,*,,,* sie,0,0,0,suffix,,*, 3,*,,,* sei,0,0,0,suffix,,, 3,*,,,* b. geiss,0,0,0,suffix,,*, 1,*,,,* geiss y,0,0,0,suffix,,*, 2,*,,,* geiss e,0,0,0,suffix,,*, 3,*,,,* (7) (16) gy,0,0,0,prefix,*,*,*,*, 01,,* eny,0,0,0,prefix,*,*,*,*, 01,,* iag,0,0,0,prefix,*,*,*,*, 03,, 10
3.1.3 (4) 17 3.1.4 Seed Seed 2 (12 ) 2 3 1 3.3 2 155 Seed 14,439 3.2 Seed MeCab 18 17 80 ( 2002:15) 18 MeCab IPA 4 Yahoo! MeCab (http://chasen.org/ taku/blog/archives/2007/06/yahoomecab.html) 11
2: Seed ( ) 9,112 (5,095) 3,027 605 14 11 1,154 (778) 87 (87) 141 (141) 13 46 19 44 11 14,284 ( ) (2BH9301.txt) (BRHO0414.txt) 100 MeCab 19 3.3 MeCab Seed MeCab MeCab 19 12
I II II 1 I II 3.4 MeCab 20 (1993) 4 1 (58 1939) 30 3: ( ) 87.3717 90.6242 91.2987 94.7831 87% 3 1 1 94.7125% 94% 20 MeCab mecab-system-eval precision recall F LEVEL 0: 99.7070(2382/2389) 99.7905(2382/2387) 99.7487 LEVEL ALL: 98.7861(2360/2389) 98.8689(2360/2387) 98.8275 LEVEL 0 LEVEL ALL precision recall F 2 2389 2387 2382 2360 (2011:53) precision recall 13
1 98.5021% (2011) 4 MeCab 4.1 MeCab 21 (17) hangug e Noun,,*,*,*,,, goqbu Noun,,*,*,*, 01,, nyn Ending,,,*,*,,,* jeqmal Adverb,,,*,*, 01,, jaimi iss e Adjective,,*, 3,*,,,* io Ending,,,*,3,,,*. Symbol,,*,*,*,.,.,* EOS 2.2 ChaKi 2 ChaKi ChaKi 2 MeCab ChaKi 21 UTF-8 Perl MeCab 14
2: ChaKi 4.2 MeCab Perl Perl CGI 22 3: Perl/CGI 1 23 ( ) ( 4) 22 http://porocise.sakura.ne.jp/korean/morph/analyzer.html MeCab 0.97 Perl Text::MeCab(0.20011) 23 HTML <ruby> ( ) 15
4: 5: 3 ( 5) 4.3 24. soniesidai,66,53,5493,noun,,,*,*,,, (18a) (18b) (18) a. soniesidai Noun,,,*,*,,, ga Ending,,,*,*,,,* joh a Adjective,,*, 3,*, 01,,* io Ending,,,*,3,,,*. Symbol,,*,*,*,.,.,* EOS b. sonie Noun,,*,*,*, 02,, sidai Noun,,*,*,*, 02,, ga Ending,,,*,*,,,* joh a Adjective,,*, 3,*, 01,,* io Ending,,,*,3,,,*. Symbol,,*,*,*,.,.,* EOS 24 MeCab (http://mecab.sourceforge.net/dic.html) 16
5 MeCab 1. 2. 3. 5.1 MeCab MeCab ( 2004:90) (19). (a). (b) (19) a. Noun,,,*,*,*,*,* ssi Noun,,*,*,*, 07,, ga Ending,,,*,*,,,* ga Verb,,*, 3,*, 01,,* ss e Suffix,,*, 3,*,,,* io Ending,,,*,3,,,*. Symbol,,*,*,*,.,.,* EOS b. Noun,,*,*,*,*,*,* ei Ending,,,*,*,,,* ga Verb,,*, 3,*, 01,,* ss e Suffix,,*, 3,*,,,* io Ending,,,*,3,,,*. Symbol,,*,*,*,.,.,* EOS 17
( ) (20) jin Noun,,*,*,*, 06,, ha Verb,,*, 2,*, 01,,* griul i Noun,,*,*,*,*,*,* nop y Adjective,,*, 2,*,,,* n Ending,,,*,2,,,* haggio Noun,,*,*,*,,, EOS ( ) griul i i ( ) gr MS Windows. (21) MS Noun,,,*,*,*,*,* W Noun,,*,*,*,*,*,* i Siteisi,,, 2,*,,,* n Ending,,,*,2,,,* do Ending,,,*,*,,,* wsnyn Noun,,*,*,*,*,*,* munjei Noun,,*,*,*, 06,, ga Ending,,,*,*,,,* manh a Adjective,,*, 3,*,,,* io Ending,,,*,3,,,*. Symbol,,*,*,*,.,.,* EOS 5 wsnyn MS W 18
5.2. (22) gyren Prefix,*,*,*,*, 01,,* mal Noun,,*,*,*, 05,, yl Ending,,,*,*,,,* dyl e Verb,,r, 3,*, 01,, - ss e Suffix,,*, 3,*,,,* io Ending,,,*,3,,,*. Symbol,,*,*,*,.,.,* EOS 01 01 1 2 ( 11) 2 3191 (22) ( ) 01 ( ) 04 ( ) 01 III 1 5.3 MeCab 19
6 MeCab MeCab 14,000 90% MeCab 20
(1997) IV pp.1 21 (2004) Conditional Random Fields [ ] 2004-NL-161 pp.89 96 (2011) 15 1 2 pp.41 56 (1997) (NAIST-IS-MT9551092) (1989) 2 pp.17 29 (1997) 3 (2010) 15 MeCab A: Vol.10 No.3 pp.17 28 (2008) MeCab [ ] 2008-CH-73 pp.17 22 (2000) Vol. 7 No. 4 pp.25 62 (2008) Web 206 pp.(1) (37) (2009) Web 211 pp.(1)-(40) (2002;2003) ( ), (2005) 2 ( 2005-1-33), ( ) (2004) :, Vol. 31 No. 1,, pp.89 99 (1993) (3 4 ), (2004), (2002) ( 2002-1-17), ( ) (2003) ( 2003-1-4), ( ) (2006), Lafferty, J., A. McCallum and F. Pereira(2001) Conditional Random Fields: Probablistic Models for Segmenting and Labeling Sequence Data. In Proc. of ICML-01, pp.282 289 21
(1989) l 1: g gg n d dd r m b bb s ss j jj c k t p h 2: a ai ia iai e ei ie iei o oa oai oi io u ue uei ui iu y yi i 3: ( ) ( ) g gg gs n nj nh d l lg lm lb ls lt lp lh m b bs s ss q j c k t p h 22
3 Windows 7 Professional Windows XP Mode (2011 6 13 )