UniDic version

Similar documents
IPA:セキュアなインターネットサーバー構築に関する調査

相続支払い対策ポイント

150423HC相続資産圧縮対策のポイント

ハピタス のコピー.pages

Copyright 2008 All Rights Reserved 2

Microsoft Word - 07kondo.docx

ュ IPADIC version Users Manual Masayuki Asahara and Yuji Matsumoto Copyright (c) 2003 Nara Institute of Science and Technology, All rights reserv

初心者にもできるアメブロカスタマイズ新2016.pages

- 2 Copyright (C) All Rights Reserved.

Copyright All Rights Reserved. -2 -!

2

Microsoft Word - 最終版 バックせどりismマニュアル .docx

untitled


manual-j.dvi

健康保険組合のあゆみ_top

リバースマップ原稿2

untitled

やよいの顧客管理

弥生給与/やよいの給与計算

弥生 シリーズ

弥生会計 プロフェッショナル/スタンダード/やよいの青色申告

弥生会計/やよいの青色申告

弥生会計 ネットワーク/プロフェッショナル2ユーザー


本組よこ/根間:文11-029_P377‐408

Copyright 2008 NIFTY Corporation All rights reserved. 2

version Copyright cfl 2001

() (MeCab) *1 Juman ChaSen *2 MeCab ChaSen 1.3 MeCab MeCab OS Windows MeCab [] [Binary package for MS-Windows] [] sourceforge.net [mecab-win32] Mac OS

untitled

Copyright 2006 KDDI Corporation. All Rights Reserved page1


1000 Copyright(C)2009 All Rights Reserved - 2 -

! Copyright 2015 sapoyubi service All Rights Reserved. 2

report03_amanai.pages

report05_sugano.pages

Morphological Analysis System ChaSen Users Manual Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Yoshitaka Hirano, Hiroshi Matsuda, Kazuma Ta

untitled

- 2 Copyright (C) All Rights Reserved.

dekiru_asa

() (MeCab) *1 Juman ChaSen *2 MeCab ChaSen 1.3 MeCab MeCab OS Windows MeCab [] [Binary package for MS-Windows] [] sourceforge.net [mecab-win32] Mac OS

how-to-decide-a-title

IPADIC version User s Manual Masayuki Asahara and Yuji Matsumoto This translation of the IPADIC user s manual was made with support from the non

Copyright Qetic Inc. All Rights Reserved. 2

DC9GUIDEBook.indb

Releases080909

URL AdobeReader Copyright (C) All Rights Reserved.

P. 2 P. 4 P. 5 P. 6 P. 7 P. 8 P. 9 P P.11 P.13 P.15 P.16 P.17 P.17 P.18 P.20 P.21 P.23 P P P P P P P.30 16

untitled

MultiPASS Suite 3.20 使用説明書

EPSON PX-G920 基本操作ガイド

keysql42_usersguide

csj-report.pdf

はがきファイリングOCR V1.1 ユーザーズガイド

MultiPASS B-20 MultiPASS Suite 3.10使用説明書

Copyright 2010 Sumitomo Mitsui Banking Corporation. All Rights Reserved.

Solibri Model Checker 9.5 スタードガイド

橡68-honbun.PDF

PPTテンプレート集 ver.1.0

20 180pixel 180pixel Copyright 2014 Yahoo Japan Corporation. All Rights Reserved.





<8B9E8B40925A904D D862E706466>




untitled



スタートアップガイド_応用編

HOLON/MD

No ii

Copyright (C) 2007 noroiya.com.all Rights Reserved. 2

Copyright 2017 JAPAN POST BANK CO., LTD. All Rights Reserved. 1


% 11.1% +6.% 4, % %+12.2% 54,16 6.6% EV7, ,183 Copyright 216 JAPAN POST GROUP. All Rights Reserved. 1

untitled

dicutil1_5_2.book

P. 2 P. 4 P. 5 P. 6 P. 7 P. 9 P P.11 P.14 P.15 P.16 P.16 P.17 P.19 P.20 P.22 P P P P P P P P P

P. 2 P. 4 P. 5 P. 6 P. 7 P. 9 P.10 P.12 P.13 P.14 P.14 P.15 P.17 P.18 P.20 P P P P P.25 P.27 P.28 Copyright 2016 JAPAN POST BA

帳票作成ツールimageWARE Form Managerカタログ


0_テキストマイニング環境構築_mac

untitled

20 H8/3069LAN H. Fukura

いま本文ー校了データ0822.indd

2 266

SNMP_Web .C...X.g.[...K.C.h

Ⅴ 古陶器にみる装飾技法

UserManualMac_*3.pages

OpenCV Windows(cygwin) Linux USB PC [1] Inel OpenCV OpenCV 1 Windows Linux OpenCV (a) (b)2 (c) (d) 1: OpenCV 1

intra-mart BaseModule/Framework

展示会レポート修正


TLS PC Link TM Users Manual I



KDDI

Transcription:

UniDic version 1.3.9 2008 7

UniDic version 1.3.9 Users Manual Yasuharu Den, Atsushi Yamada, Hideki Ogura, Hanae Koiso, and Toshinobu Ogiso Copyright c 2007 2008 The UniDic consortium. All rights reserved. version 1.3.0 2 April 2007 version 1.3.5 12 October 2007 version 1.3.8 25 April 2008 version 1.3.9 15 July 2008

I 2 1 2 1.1..................................... 2 1.2.................................. 2 1.3...................................... 3 2 UniDic-chasen 6 2.1........................................... 6 2.2......................................... 6 2.3......................................... 6 2.4........................................... 7 2.5........................................... 8 2.6 chasenrc........................................... 8 3 UniDic-mecab 10 3.1........................................... 10 3.2........................................ 10 3.3 dicrc............................................. 10 II 11 4 UniDic 11 4.1................................................ 11 4.2.............................................. 11 4.3.............................................. 13 4.4................................................ 14 5 14 5.1.................................................. 15 5.2................................................. 17 5.3................................................. 19 6 20 6.1....................................... 20 6.2....................................... 20 6.3............................................ 20 6.4............................................ 22 i

6.5............................................. 23 6.6........................................... 23 6.7........................................... 23 7 24 7.1.................................................. 24 A 25 A.1 Version 1.3.0...................................... 25 A.2 Version 1.3.5...................................... 25 A.3 Version 1.3.8...................................... 25 ii

UniDic ChaSen UniDic-chasen MeCab UniDicmecabChaSen MeCab ChaSen IPA ipadic UniDic ISTC 21 1822 18 Tel: 042-540-4300 E-mail: unidic@kokken.go.jp 1

I 1 UniDic ChaSenver. 2.4.0 MeCabver. 0.96 1.1 Windows Linux/Cygwin Windows chaunilinux/cygwin Linux/Cygwin configure ChaSen/MeCab./configure --with-use-mecab=0./configure --with-use-chasen=0 # ChaSen # MeCab Cygwin configure Cygwin./configure --with-systemtop=d:/cygwin utf8 ChaSen -i w chasen -i w < 1.2 1.2.1 UniDic-chasen Windows 1. unidic-chasen139_xxxx.zip unidic-chasen139_xxxx XXXX utf8, sjis, eucj 2. ChaSen C:\Program Files\ChaSen dic dic 3. 1 2 dic Linux/Cygwin 1. unidic-chasen139_xxxx.tar.gz unidic-chasen139_xxxx 2

XXXX utf8, eucj, sjis 2. ChaSen /usr/local/lib/chasen/dic *1 unidic unidic 3. 1 2 unidic 4. 3 unidic chasenrc $HOME/.chasenrc GRAMMAR Cygwin GRAMMAR Windows (GRAMMAR D:/Cygwin/usr/local/lib/chasen/dic/unidic) 1.2.2 UniDic-mecab Windows 1. unidic-mecab139_xxxx.zip unidic-mecab139_xxxx XXXX utf8, sjis, eucj 2. MeCab C:\Program Files\MeCab dic unidic unidic 3. 1 2 unidic MeCab -d mecab -d "C:\Program Files\MeCab\dic\unidic" Linux/Cygwin 1. unidic-mecab139_xxxx.tar.gz unidic-mecab139_xxxx XXXX utf8, eucj, sjis 2. MeCab /usr/local/lib/mecab/dic *2 unidic unidic 3. 1 2 unidic MeCab -d mecab -d /usr/local/lib/mecab/dic/unidic 1.3 *1 ChaSen chasen-config --dicdir *2 MeCab mecab-config --dicdir 3

1.3.1 UniDic-chasen Windows 1. unidic-chasen139src.zip unidic-chasen139src 2. ChaSen C:\Program Files\ChaSen dic 3. 1 unidic-chasen139src ChaSen 4. 3 unidic-chasen139src Makefile.bat 2 dic 4 Filler.dic.dic utf8 2 Makefile_sjis.bat Makefile_eucj.bat utf8 Linux/Cygwin 1. unidic-chasen139src.tar.gz unidic-chasen139src 2. 1 unidic-chasen139src./configure && make 3. make install /usr/local/lib/chasen/dic/unidic 4. 3 unidic chasenrc $HOME/.chasenrc GRAMMAR 2 configure --with-exclude-dic,./configure --with-exclude-dic=filler.dic Cygwin configure Cygwin./configure --with-systemtop=d:/cygwin GRAMMAR Windows (GRAMMAR D:/Cygwin/usr/local/lib/chasen/dic/unidic) utf8 2 configure with-encoding=sshift-jis eeuc-jp make install utf8 4

1.3.2 UniDic-mecab Windows 1. unidic-mecab139src.zip unidic-mecab139src 2. MeCab C:\Program Files\MeCab dic unidic 3. 1 unidic-mecab139src MeCab 4. 3 unidic-mecab139src Makefile.bat 2 unidic 4 Filler.csv.csv utf8 Makefile_sjis.bat Makefile_eucj.bat utf8 MeCab -d mecab -d "C:\Program Files\MeCab\dic\unidic" Linux/Cygwin 1. unidic-mecab139src.tar.gz unidic-mecab139src 2. 1 unidic-mecab139src./configure && make 3. make install /usr/local/lib/mecab/dic/unidic 2 configure --with-exclude-dic,./configure --with-exclude-dic=filler.csv utf8 configure with-charset=sjisshift-jis euc-jpeuc-jp make install utf8 MeCab -d mecab -d /usr/local/lib/mecab/dic/unidic 5

2 UniDic-chasen 2.1 grammar.cha % ctypes.cha cforms.cha ( ( () () () () () ()) ( %) ( %)) 2.2 ctypes.cha (( ) ( -... -)) 2.3 cforms.cha (- ((- *) (- *) ( *) ( *) (- *) (- *) (- *) (- *) (- *) (- *) (- *))) 6

ChaSen ipadic UniDic-chasen cforms.cha 2.4.dic (POS ( )) ((LEX ( 0)) (READING ) (PRON ) (INFO orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" acontype=" %F2@0, %F1, %F2@-1" goshu="" )) (POS ( )) ((LEX ( 4000)) (READING ) (PRON ) (INFO orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="1" acontype="c3" goshu="" )) amodtype (POS ( )) ((LEX ( 261)) (READING ) (PRON ) (CTYPE -) (CFORM -) (BASE ) (INFO orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="2" acontype="c1" goshu="" )) (POS ( )) ((LEX ( 261)) (READING ) (PRON ) (CTYPE -) (CFORM -) (BASE ) (INFO orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="2" acontype="c1" goshu="" )) (POS ( )) ((LEX ( 261)) (READING ) (PRON ) (CTYPE -) (CFORM ) (BASE ) (INFO orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="2" acontype="c1" amodtype="m2@1" goshu="" ))... (POS ( )) ((LEX ( 261)) (READING ) (PRON ) (CTYPE -) (CFORM -) (BASE ) (INFO orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="2" acontype="c1" amodtype="m4@1" goshu="" )) UniDic-chasen ChaSen INFO 4.4 7

2.5 connect.cha (( ((( ))) ((( ))) ) 814) (( ((( ) - -)) ((() - - )) ) 147) (( (((*))) ((() - - )) ) 8000) (( ((( ) * * )) ((() - - )) ) 425) 2.6 chasenrc chasenrc ChaSen (GRAMMAR /usr/local/lib/chasen/dic) (DADIC chadic) (UNKNOWN_POS ( )) (OUTPUT_FORMAT ; 1 "<cha:w1 orth=\"%m\" kana=\"%?u/%m/%y/\" pron=\"%?u/%m/%a/\" pos=\"%u(%p-)\"%?t/ ctype=\"%t \"//%?F/ cform=\"%f \"// %?I/ %i//>%m</cha:w1>\n") (OUTPUT_COMPOUND "SEG") (EOS_STRING "") (DEF_CONN_COST 10000) (POS_COST ((*) 1) ((UNKNOWN) 30000) ) (CONN_WEIGHT 1) (MORPH_WEIGHT 1) (COST_WIDTH 0) (ANNOTATION (("<" ">") "%m\n")) 8

GRAMMAR UNKNOWN_POS OUTPUT_FORMAT EOS_STRING ANNOTATION xml < > UniDic-chasen ChaSen xml OUTPUT_FORMAT ; <cha:w1>...</cha:w1> 1 <cha:w1 orth="" kana="" pron="" pos="--" orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="0" acontype="c2" goshu=""></cha:w1> <cha:w1 orth="" kana="" pron="" pos="--" orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="0,1" acontype="c2" goshu=""></cha:w1> <cha:w1 orth="" kana="" pron="" pos="-" orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" acontype=" %F2@0, %F1, %F2@-1" goshu=""></cha:w1> <cha:w1 orth="" kana="" pron="" pos="--" orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="0,1" acontype="c2" goshu=""></cha:w1> <cha:w1 orth="" kana="" pron="" pos="-" ctype="" cform="-" orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" atype="0" acontype="c3" goshu=""></cha:w1> <cha:w1 orth="" kana="" pron="" pos="" ctype="-" cform="-" orthbase="" kanabase="" pronbase="" lform="" lemma="" form="" acontype=" %F4@1" goshu=""></cha:w1> xslt uniutils xml2txt.xsl 9

3 UniDic-mecab 3.1.csv UniDic-chasen.dic 3.2.def http://mecab.sourceforge.net/ 3.3 dicrc dicrc MeCab cost-factor = 700 bos-feature = BOS/EOS,*,*,*,*,*,*,*,*,*,*,*,* eval-size = 9 unk-eval-size = 4 max-grouping-size = 10 output-format-type = unidic node-format-unidic = %m\t%f[10]\t%f[6]\t%f[7]\t%f-[0,1,2,3]\t%f[4]\t%f[5]\t%f[12]\n unk-format-unidic = %m\t%m\t%m\t%m\t%f-[0,1,2,3]\t%f[4]\t%f[5]\t%f[12]\n eos-format-unidic = EOS\n output-format-type node-format-xxx, unk-format-xxx, bos-format-xxx, eos-format-xxx XXX f[0]f[12] MeCab rewrite.def dicrc http://mecab.sourceforge.net/ 10

II 4 UniDic UniDic 3 : 22, 2007 10 4.1 1 UniDic 2 1 2 1 1 1 1 3 1 18, pp. 101 108, 2007 3 UniDic UniDic 5.2 4.2 UniDic 4 1 11

1 2 1 12

1 1 2 UniDic 4.3 1 4 UniDic chaone 3 1 2 + 3 2 chaone 13

1 UniDic lform lemma goshu form pos ctype cform itype iform icontype ftype fform fcontype orthbase orth kanabase kana pronbase pron atype amodtype acontype 4.4 1 ChaSen chasenrc xml ChaSen LEX, READING, PRON, POS, CTYPE, CFORM INFO 2.4 MeCab 5 UniDic UniDic ipadic IPA - 14

2 5.1 2 --{,, } ----- - -- -- 15

*3 -- - - -- - - - -- + - - + - --{,, } - ----- - - - -{, } - - -{,,,, } - -- -- *3 16

5.2 5.2.1 3 -- -- -- -- -- -- -+{, } -+-+ + -+{, } -+-+ + 5.2.2 4-2 5.2.3 + 17

3 18

4 5.2.4 -- 5.3 5 - -{, } -- - UniDic - - 5 19

- - - - - - - - - - 6 UniDic 6.1 6 6.2 7 6.3 8 20

6 21

7 / 8 N1 N3 / / N4 N6 / N8 / / / Nj Nh Ns Nm Nn :, :, : 6.4 22

9 0 1 M1@M N 0 M M2@M N 0 M M 0 M4@M M 0 M 0 M N 0 :, M 0 :, M: 10 C1 N 1 + M 2, C2 N 1 + 1, C3 N 1, C4 0, C5 M 1 N 1 :, M 1 :, M 2 : 6.5 1 3 0 6.6 9 6.7 10 11 12 *4 chaone 3 C2 C1 2 *4 12 23

11 0 N 2 P1 0 N 1 + M 2 P2 N 1 + 1 N 1 + M 2 P4 N 1 + 1 M 1 P6 0 P13 M 1 P14 M 1 N 1 + M 2 N 1 :, M 1 :, N 2 :, M 2 : 12 0 F1 M 1, F2@M N 1 + M M 1, F3@M M 1 N 1 + M F4@M N 1 + M F5 0 F6@M, L N 1 + M N 1 + L N 1 :, M 1 :, M, L: 7 7.1 13 13 24

A A.1 Version 1.3.0 A.2 Version 1.3.5 MeCab A.3 Version 1.3.8 MeCab 25