ChaKi version 0.10 August 21, 2006 Copyright c 2006
1 3 MySQL ChaKi GUI 1.1 MySQL MySQL (http://dev.mysql.com/downloads/mysql/5.0.html) Windows (Windows Essentials) mysql-essential-5.0.22.zip Setup.exe [next] [Typical] [next] [Install]
[Skip Sign-Up] [Configure the MySQL Server now] [Finish] [Standard Configuration]
[Install As Windows Service] [Launch the MySQL Server automatically] Service Name:[MySQL] [Include Bin Directory in Windows PATH] [Modify Security Settings] root okage [Execute] [Finish] my.ini c:\program Files\MySQL\MySQL Server5.0\my.ini nodepad.exe ( ) [mysql] default-character-set=latin1 default-character-set=sjis [mysqld] default-character-set=latin1 default-character-set=sjis MySQL [ ] [ ] [ ] [ ] [ ]
MySQL Windows 1.2 ChaKi GUI ChaKi GUI C c:\chaki 1.3 1.3.1 TreeTagger TreeTagger 1 Parameter files for PC (Linux and Windows, Latin1 character set) English parameter file (english-par-linux-3.1.bin.gz) 2 Windows version TreeTagger (tree-tagger-windows-3.1.zip) 3 TreeTagger tree-tagger-windows-3.1.zip c:\treetagger english-par-linux-3.1.bin.gz Lhaca 4 english-par-linux-3.1.bin english.par c:\treetagger\lib english.par 1 1 answerbus 5 sentense segmenter 6 Windows 7 c:\treetagger\bin NICT 8 tokenizer.rb 9 tokenizer.exe 10 tokenizer.exe c:\treetagger\bin TreeTagger perl perl c:\treetagger\bin TreeTagger tag-english.bat tag-english.bat 11 TreeTagger tag-english.bat set TAGDIR=C:\TreeTagger TreeTagger 1 http://www.ims.uni-stuttgart.de/projekte/corplex/treetagger/decisiontreetagger.html 2 ftp://ftp.ims.uni-stuttgart.de/pub/corpora/english-par-linux-3.1.bin.gz 3 ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tree-tagger-windows-3.1.zip 4 http://www.vector.co.jp/soft/win95/util/se166893.html 5 http://answerbus.coli.uni-sb.de/index.shtml 6 http://answerbus.coli.uni-sb.de/sentence/ 7 http://answerbus.coli.uni-sb.de/sentence/ss.exe 8 http://www2.nict.go.jp/jt/a132/members/mutiyama/ 9 http://www2.nict.go.jp/jt/a132/members/mutiyama/software/ruby/tokenizer.rb 10 \prog\tokenizer.exe 11 \prog\tag-english.bat
1.3.2 MeCab, ChaSen, Juman CaboCha, KNP MeCab MeCab 12 mecab-0.xx.exe (Binary Package for MS-Windows) 2006 8 8 mecab-0.93.exe. mecab-0.xx.exe [I accept the agreement] 12 http://mecab.sourceforge.jp/ Start Menu
Start Menu [install] MeCab [Yes] [Finish] ChaSen ChaSen 13 cha233 031208.exe (Windows ) WinCha cha21244sp5.exe cha233 031208.exe [Yes] 13 http://chasen.naist.jp/hiki/chasen/
[I accept the agreement]
[Install] [Finish] JUMAN JUMAN 14 juman-x.x.exe (Windows ) 2006 8 8 juman-5.1.exe. juman-5.1.exe 14 http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html
[I accept the agreement] [Install] [Finish]
CaboCha CaboCha 15 cabocha-x.xx.exe (Windows ) 2006 8 8 cabocha-0.53.exe. cabocha-0.53.exe [I accept the agreement] Start Menu 15 http://chasen.org/~taku/software/cabocha/
Start Menu [Install] CaboCha [Yes] [Finish] KNP KNP 16 knp-x.x.exe (Windows ) 2006 8 8 knp-2.0.exe. 16 http://nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html
[I accept the agreement] [Install]
[Finish] 2 2.1 2.1.1 db ChaKi 2 Project Gutenberg 17 6 18 4 English Dickens Charles: A Christmas Carol Dickens Charles: A Tale of Two Cities Dickens Charles: Oliver Twist Jane Austen: Emma Jane Austen: Persuasion Jane Austen: Pride and Prejudice Japanese : : : : 2.1.2 English Japanese c:\temp db english.bat japanese.bat 1..\prog\cabocha2dat.exe -f english -t c:\temp -h localhost -u root -p okage -d english --corpusformat=english --spacing=english -t c:\temp c:\temp 17 http://www.gutenberg.org/ 18 http://www.aozora.gr.jp/
MySQL -p okage okage -d english english japanese.bat db english.def japanese.def ChaKi corpusname=english server=localhost user=root password=okage 2.1.3 MySQL password=okage okage corpusname=english english japanese.def MySQL mysql.exe mysql.exe -uroot -pokage MySQL -pokage okage mysql drop database english; drop database japanese; 2.2 ( ) ChaKi HTML Project Gutenberg 19 \prog tag-english.bat c:\test.txt c:\treetagger\bin\tag-english.bat c:\test.txt c:\test.tnt c:\test.tnt ChaKi Cabocha2datLauncher.exe 19 http://www.gutenberg.org/
Source file/folder: Source Type: English Spacing: English Corpus Name DB Server localhost DB Username MySQL DB Password MySQL ChaKi cabocha2dat.exe 1 c:\chaki\chaki\cabocha2dat.exe -c test.tnt -t c:\temp -h localhost -u root -p okage -d test --corpusformat=english --spacing=english -c -t -h localhost -u MySQL -p MySQL -d --corpusformat English --spacing English 2.3 ( ) ChaKi 20 20 http://www.aozora.gr.jp/
2.3.1 MeCab MeCab IPA 1 1 SJIS input.txt \Program Files\MeCab\bin\mecab.exe input.txt > input.mecab input.mecab 1 c:\chaki\chaki\cabocha2dat.exe -c input.mecab -t c:\temp -h localhost -u root -p okage -d inputmecab --corpusformat=mecab --spacing=japanese -c -t -h localhost -u MySQL -p MySQL -d --corpusformat MeCab MeCab --spacing Japanese 2.3.2 ChaSen ChaSen IPA MeCab SJIS 1 1 input.txt ChaSen -F \Program Files\ChaSen\chasen.exe -F "%m\t%y\t%a\t%m\t%u(%p-)\t%t \t%f \n" input.txt > input.chasen input.chasen \Program Files\ChaSen\dic\chasenrc -F ( "%m\t%y\t%a\t%m\t%u(%p-)\t%t \t%f \n") 1 c:\chaki\chaki\cabocha2dat.exe -c input.chasen -t c:\temp -h localhost -u root -p okage -d inputchasen --corpusformat=chasen --spacing=japanese -c -t -h localhost -u MySQL -p MySQL -d --corpusformat chasen ChaSen --spacing Japanese
2.3.3 JUMAN JUMAN MeCab SJIS 1 1 input.txt \Program Files\juman\juman.exe < input.txt > input.juman input.juman 1 c:\chaki\chaki\cabocha2dat.exe -c input.juman -t c:\temp -h localhost -u root -p okage -d inputjuman --corpusformat=juman --spacing=japanese -c -t -h localhost -u MySQL -p MySQL -d --corpusformat JUMAN JUMAN --spacing Japanese 2.3.4 CaboCha CaboCha ChaSen MeCab IPA MeCab SJIS 1 1 input.txt \Program Files\CaboCha\bin\cabocha.exe -f1 input.txt > input.cabocha input.cabocha 1 c:\chaki\chaki\cabocha2dat.exe -c input.cabocha -t c:\temp -h localhost -u root -p okage -d inputcabocha --corpusformat=cabocha --spacing=japanese -c -t -h localhost -u MySQL -p MySQL -d --corpusformat CaboCha CaboCha --spacing Japanese
2.3.5 KNP KNP juman MeCab SJIS 1 1 input.txt \Program Files\juman\juman.exe < input.txt \Program Files\knp\knp.exe -tab > input.knp input.knp 1 c:\chaki\chaki\cabocha2dat.exe -c input.knp -t c:\temp -h localhost -u root -p okage -d inputknp --corpusformat=knp --spacing=japanese -c -t -h localhost -u MySQL -p MySQL -d --corpusformat KNP KNP --spacing Japanese 2.4 Penn Treebank Penn Treebank ChaKi 2.4.1 Penn Treebank combined combined db\ptb db\ptb ptbimport.bat ptbimport.bat 2 -u MySQL -p MySQL -d ptbimport.bat db\ptb ptb.def, ptb.pos ChaKi ptb.def 2.4.2 1. CaboCha 2.
CaboCha combined2s.exe combined\wsj ptbconv.exe -D dep2cabocha.exe > ptb.cabocha 3 combined2s.exe Penn Treebank.mrg S Parsed Tree ptbconv.exe 21 Penn Treebank Parsed Tree 22 dep2cabocha.exe ptbconv CaboCha..\ChaKi\cabocha2dat.exe -c ptb.cabocha -t c:\temp -h localhost -u root \\ -p okage -d ptb --corpusformat=cabocha --spacing=english -c -t -h localhost -u MySQL -p MySQL -d --corpusformat CaboCha --spacing English 2.5 BNC BNC ChaKi 2.5.1 BNC CD-ROM disk 1 texts.tar.gz. BNC Texts db\bnc db\bnc bncimport.bat bncimport.bat -u MySQL -p MySQL -d bnc bncimport.bat Texts\A bncimport.bat @rem bncimport.bat 21 http://www.jaist.ac.jp/ 22 http://www.jaist.ac.jp/ h-yamada/
2.5.2 1. TreeTagger 2. TreeTagger bnc2tnt.exe Texts\A > bnca.tnt Texts\A TreeTagger bnca.tnt..\..\chaki\cabocha2dat.exe -c bnca.cabocha -t c:\temp -h localhost -u root\\ -p okage -d bnca --corpusformat=english --spacing=english -c -t -h localhost -u MySQL -p MySQL -d --corpusformat English --spacing English 2.6 ChaKi 2.6.1 dat/ syn/.knp db\kc kcimport.bat kcimport.bat 2 -u MySQL -p MySQL -d 2.6.2 1. CaboCha 2.
TreeTagger kc2cabocha.exe syn syn CaboCha kc4.knp KNP..\..\ChaKi\cabocha2dat.exe -f syn -t c:\temp -h localhost -u root\\ -p okage -d kc --corpusformat=cabocha --spacing=japanese -f -t -h localhost -u MySQL -p MySQL -d --corpusformat CaboCha --spacing Japanese 3 1 (MySQL ) (ChaKi GUI) 3.1 (Windows) Windows 1.1 MySQL IP address 192.168.0.2 ( hostname server.your.domain), IP address 192.168.0.3 ( hostname client01.your.domain) user shika japanese mysql -u root -p Enter password:<<mysql root >> mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON japanese.* TO user@192.168.0.3 IDENTYFIED BY shika ; mysql> exit; mysqladmin -uroot -p flush-privileges Enter password:<<mysql root >> hostname user@192.168.0.3 user@client01.your.domain
mysql -u root -p Enter password:<<mysql root >> mysql> GRANT SELECT ON english.* TO user@192.168.0.3 IDENTYFIED BY shika ; mysql> exit; mysqladmin -uroot -p flush-privileges Enter password:<<mysql root >> 3.2 (Windows).def IP address 192.168.0.2 ( hostname server.your.domain), IP address 192.168.0.3 ( hostname client01.your.domain) user shika japanese japanese.def corpusname=japanese server=192.168.0.2 user=user password=shika hostname server=192.168.0.2 server=server.your.domain 4 UTF8 4.1 MySQL my.ini c:\program Files\MySQL\MySQL Server5.0\my.ini nodepad.exe ( ) [mysql] default-character-set=latin1 default-character-set=utf8 [mysqld] default-character-set=latin1 default-character-set=utf8 MySQL [ ] [ ] [ ] [ ] [ ] MySQL Windows 4.2 utf8 cabocha2dat.exe --encode=utf8 1
c:\chaki\chaki\cabocha2dat.exe -c input.cabocha -t c:\temp -h localhost -u root -p okage -d inputcabocha --corpusformat=cabocha --spacing=japanese --encode=utf8 cabocha2dat.exe input.cabocha utf8 4.3 GUI [Options] [Search Options] [Character Encoding] UTF-8 [Options] [Font Setting] [KwicColumnPrimary] 5 (.def ) lexicontype 0: none 1: english 2: cabocha 3: chasen 4: comp( )