1 1.1 (, 2013) (, 2013) 1.2 () ( ) /z/ [dz] [z] ( ) ( ) 70 > () 40 > 1.3 (i.e. ) 2
|
|
- よりお すえがら
- 5 years ago
- Views:
Transcription
1 II URL II AE11C61) AE11C ( ) C 1 11 ( ) ( ) ( ) 1 5 2D203 (1) / (2) / / (3) mizimada@ninjal.ac.jp 1
2 1 1.1 (, 2013) (, 2013) 1.2 () ( ) /z/ [dz] [z] ( ) ( ) 70 > () 40 > 1.3 (i.e. ) 2
3 1 /Z/ (, 2013) 2 (, 2013) ( *1 ( ) ) 1 (BCCWJ) 11 ( ) 100 () ( ) (Shift JIS, EUC-JP, UTF-8 ) (JIS X0213, Unicode ) ( ) *1... 3
4 1.4 (i.e. ) (i.e. ) Brown Corpus (1950 ) (1951) (1955) 60 ( 30 ) (1), (2)(1960, 1963) I III( ) (1970) (1980 ) ( ) CD-ROM( 4
5 )... CD-ROM CD-HIASK CD- CD-ROM 100 (1990) 2011 ( ) 2. () ATR ATR EDR (EDR) EDR 500 (EDR ) RWC (RWCP) 90 ( 5 ) (1990 ) 1. PC () J-TEXTS (1997) (2002) 2 CD-ROM KY 90 OPI( )
6 NAIST (2000 ) DVD CASTEL/J CASTEL/J( ) Yahoo! Google (2000 ) UniDic Yahoo! ( 2 )
7 BCCWJ 1030 JpWaC (Japanese Web as Corpus) 4.1 SketchEngine 11 Lago Ninjal-LWP ( 16 ) (GSK) Web N 1 Google n-gram (2013) [ ] 1. (2013) [ ] 1. (2013) [ ] 1. 7
8 2 2.1 ( ) (CSJ) X-JToBI(eXtended Japanese Tones and Break Indices) 2: X-JToBI ( ) * 2 8
9 BI Break Index( ) (1) (2) (3) 3 BI 4 X-JToBI JUMAN IPADIC UniDic JUMAN JUMAN IPADIC ChaSen MeCab (IPA) IPA UniDic 2 (UniDic ) JUMAN IPADIC UniDic( ) *2 /h/ [a,e,o] [h]( ) [i] [ç]() [u] [F]( ) 9
10 形態論情報 読み 基本形 (辞書形) 品詞 活用型 (五段活用 一段活用など) 活用形 (未然形 連用形など) など 辞書によって登録されている情報の種類は異なる 図5 形態素解析の例 (徳永, 2013) 統語論レベル 英語では句構造文法 (Phrase Structural Grammar) 日本語では文節単位の依存文法 (Dependency Grammar) がよく用いられる 図 6 統語解析の例 (徳永, 2013) 図7 依存文法と句構造文法 (Wikipedia: Dependency Grammar) 意味論レベル 文を構成する要素の意味や文全体の意味構造に関する情報 語義 個々の語の意味に関する情報 語義曖昧性解消など 10
11 述語項構造 述語と項の意味関係に関する情報 格関係構造 格フレーム情報など 図 8 意味解析の例 (徳永, 2013) 談話レベル 文を超えたレベルの情報 照応 指示詞や代名詞とその先行詞との関係 共参照 同一指示 2 つの語の指示対象が同じであるという関係 太郎はうなぎバーガーを食べた 次郎も それを食べた という文では うなぎバーガー と それ は照応関係にあるが共参照関係にはない 談話構造 節と節の間の意味関係 原因と結果など 図 9 談話解析の例 (徳永, 2013) 2.2 コーパスの物理フォーマット 独自形式 KNP 形式 京大コーパスのフォーマット 構文解析システム KNP の出力形式に準拠 #, *, + がそれ ぞれ文 文節 タグ単位 (格関係情報などの付与単位) を表し その他の行が語を表す EOS は文末 (End Of Sentence) を表す # S-ID : KNP :97/01/21 MOD :2004/06/24 * 0 1D + 0 1D 昨年 さくねん * 名詞 時相名詞 * * * 1 6D + 1 6D 末 すえ * 名詞 時相名詞 * * * 特殊 読点 * * * 2 3D + 2 3D 今年 ことし * 名詞 時相名詞 * * の の * 助詞 接続助詞 * * * 3 4D + 3 4D <rel type =" 時 間 " target =" 今 年 " sid =" " tag ="2"/ > 干支 えと * 名詞 普通名詞 * * 11
12 16 * * * 17 * 4 6D D <rel type=" " target=" " sid=" " tag="3"/> 19 * * * 20 * * * 21 * 5 6D D 23 * * * * 24 * * * 25 * 6-1D D <rel type=" " target=" " sid=" " tag="1"/><rel type=" " target=" " sid =" " tag="4"/><rel type=" " target=" "/ > 27 * 28 * * * 29 EOS XML XML(Extensivle Markup Language) (de facto) (CSJ) (BCCWJ) XML 1 <article id=" " > 2 <sentence id=" " info="knp:97/01/21 MOD:2004/06/24" > 3 <chunk id="0" link="1" rel="d"> 4 <tag id="0" link="1" rel="d"> 5 <tok id="0" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 6 </tag > 7 </chunk > 8 <chunk id="1" link="6" rel="d"> 9 <tag id="1" link="6" rel="d"> 10 <tok id="1" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 11 <tok id="2" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 12 </tag > 13 </chunk > 14 <chunk id="2" link="3" rel="d"> 15 <tag id="2" link="3" rel="d"> 16 <tok id="3" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 17 <tok id="4" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 18 </tag > 19 </chunk > 20 <chunk id="3" link="4" rel="d"> 21 <tag id="3" link="4" rel="d"> 22 <rel type=" " target=" " sid=" " tag="2"/> 23 <tok id="5" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 24 <tok id="6" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 25 </tag > 26 </chunk > 27 <chunk id="4" link="6" rel="d"> 28 <tag id="4" link="6" rel="d"> 29 <rel type=" " target=" " sid=" " tag="3"/> 30 <tok id="7" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 31 <tok id="8" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 32 </tag > 33 </chunk > 34 <chunk id="5" link="6" rel="d"> 35 <tag id="5" link="6" rel="d"> 36 <tok id="9" read=" " base=" " pos=" " ctype="*" cform="*"> </ tok > 37 <tok id="10" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 38 </tag > 39 </chunk > 40 <chunk id="6" link="-1" rel="d"> 41 <tag id="6" link="-1" rel="d"> 42 <rel type=" " target=" " sid=" " tag="1"/> 43 <rel type=" " target=" " sid=" " tag="4"/> 44 <rel type=" " target=" "/ > 12
13 45 <tok id="11" read=" " base=" " pos=" " ctype=" " cform=" " > </ tok > 46 <tok id="12" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 47 </tag > 48 </chunk > 49 </sentence > RDB RDBMS(RDB ) Microsoft SQL Server MySQL SQLite RDB (BCCWJ) BCCWJ ChaKi (NAIST) RDB MySQL SQLite RDBMS MSSQL MySQL SQLite RDBMS(SQLite) (CSJ) XML SQLite (CSJ-RDB) ChaKi *3 *4 SQLite ChaKi *5 RDB NoSQL(RDBMS ) Brown Corpus LOB Brown Corpus 10 Brown Corpus 100 BNC British National Corpus CLAWS ( ) ANC American National Corpus BNC (2003 ) 1100 ( 1 ) Web Open ANC 1400 Penn Treebank 1993 Wall Street Journal WSJ Brown Corpus *3 MeCab *4 CaboCha *5 KNB BCCWJ BCCWJ 13
14 PropBank Proposition Bank 2005 PTB Levin(1993) VerbNet FrameNet Fillmore(1982) BNC ANC WSJ OntoNote PTB PropBank PDTB Penn Discourse Tree Bank 2008 PTB PropBank WSJ EDR (EDR) ( ) RWC ( ) 3 7 ( 91 ) NAIST (CSJ) (BCCWJ) () (12 ) (Windows ) 14
15 (#) (*) (+) 4 1 # S-ID: KNP:97/01/21 MOD:2004/06/24 2 * 0 1D 3 * 0 1D 4 * * * JUMAN ( ) 6 1 * (D) (P) (I) (A) 4 + () ( ) 1 <tag id="6" link="-1" rel="d"> 2 <rel type=" " target=" " sid=" " tag="1"/> 3 <rel type=" " target=" " sid=" " tag="4"/> 4 <rel type=" " target=" "/ > 5 <tok id="11" read=" " base=" " pos=" " ctype=" " cform=" " > </ tok > 6 <tok id="12" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 7 </tag > 15
16 1 <rel type=" " target=" "/ > 2 <tok id="11" read=" " base=" " pos=" " ctype=" " cform=" " > </ tok > PO CO NO 1 <sentence id=" " info="knp:97/01/21 MOD:2004/06/24" > 2 <chunk id="0" link="2" rel="d"> 3 <tag id="0" link="2" rel="d"> 4 <rel type="=" target=" " sid=" " tag="10"/> 5 <tok id="0" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 6 <tok id="1" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 7 </tag > 8 </chunk > 9 <chunk id="1" link="2" rel="d"> 10 <tag id="1" link="2" rel="d"> 11 <tok id="2" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 12 <tok id="3" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 13 <tok id="4" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 14 </tag > 15 </chunk > 16 <chunk id="2" link="-1" rel="d"> 17 <tag id="2" link="-1" rel="d"> 18 <rel type=" " target=" " sid=" " tag="0"/> 19 <memo >CO</memo > 20 <rel type=" " target=" " sid=" " tag="1"/> 21 <tok id="5" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 22 <tok id="6" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 23 </tag > 24 </chunk > 25 </sentence > = 1 <rel type="=" target=" " sid=" " tag="10"/> 2 <tok id="0" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > DVD18 Disk1 Disk2 Disk CSJ (752 ) () (50 ) ( ) Praat WaveSurfer 16
17 1 % ID:A01M % 3 %<SOT > L: 5 (F ) & (F ) 6 & <H> L: 8 & L: 10 (F ) & (F ) 11 & 12 & <H> 13 & Excel A D E F G H 10 A D E F G H Q R 11 XML XML IPU(Inter-Pausal Unit: 200ms ) 1 <?xml version ="1.0" encoding="utf -8"?> 2 <Talk SpeakerBirthGeneration ="55to59" SpeakerBirthPlace=" " SpeakerID ="47" SpeakerSex=" " TalkID=" A01M0007"> 3 <TalkComment > 4 <Comment CommentStrings=" ID:A01M0007"/> 5 <Comment CommentStrings=""/> 6 <Comment CommentStrings=""/> 7 </TalkComment > 8 <IPU Channel="L" IPUEndTime =" " IPUID ="0001" IPUStartTime =" " > 9 <LUW IsNewLine ="1" LUWDictionaryForm=" " LUWID="1" LUWLemma=" " LUWPOS=" " LineID="001"> 10 <SUW ClauseUnitID ="0" ColumnID ="001" Dep_BunsetsuUnitID ="0" OrthographicTranscription ="(F )" PhoneticTranscription ="(F )" PlainOrthographicTranscription=" " SUWDictionaryForm =" " SUWID="1" SUWLemma=" " SUWPOS=" " > 11 <TransSUW TagFillerEnd ="1" TagFillerStart ="1" TransSUWID="1"> 17
18 12 <Mora MoraEntity=" " MoraID="1"> 13 <Phoneme PhonemeEntity="e" PhonemeID="1"> 14 <Phone PhoneID ="1" PhoneEntity="e" PhoneClass="vowel" PhoneStartTime =" " PhoneEndTime =" " EndTimeUncertain ="1"/> 15 </Phoneme > XML 1 <?xml version ="1.0" encoding="utf -8"?> XML 2 <Talk SpeakerBirthGeneration ="55to59" SpeakerBirthPlace=" " SpeakerID ="47" SpeakerSex=" " TalkID=" A01M0007"> 3 <ClauseUnit ClauseUnitID ="0" IPUID =" " IPUStartTime =" " IPUEndTime =" " > 4 <Bunsetsu Dep_BunsetsuUnitID="0"> 5 <LUW IsNewLine ="1" LUWDictionaryForm=" " LUWID="1" LUWLemma=" " LUWPOS=" " LineID ="001"> 6 <SUW ClauseUnitID ="0" ColumnID ="001" Dep_BunsetsuUnitID ="0" OrthographicTranscription ="(F )" PhoneticTranscription ="(F )" PlainOrthographicTranscription=" " SUWDictionaryForm=" " SUWID="1" SUWLemma=" " SUWPOS=" " PhoneStartTime =" " PhoneEndTime ="0.687"/ > 7 </LUW > 8 </Bunsetsu > 9 <Bunsetsu Dep_BunsetsuUnitID ="1" Dep_ModifieeBunsetsuUnitID ="2"> 10 <LUW IsNewLine ="1" LUWDictionaryForm=" " LUWID="2" LUWLemma=" " LUWPOS=" " LineID="002"> 11 <SUW ColumnID ="001" Dep_BunsetsuUnitID ="1" Dep_ModifieeBunsetsuUnitID ="2" OrthographicTranscription =" " PhoneticTranscription=" & lt;h>" PlainOrthographicTranscription=" " SUWDictionaryForm=" " SUWID="1" SUWLemma=" " SUWPOS=" " ClauseUnitID ="0" PhoneStartTime ="0.687" PhoneEndTime =" " IPUBoundary ="1"/> 12 </LUW > 13 </Bunsetsu > ()SC ,500 ()SC ,000 SC Yahoo!Yahoo! 3, , C-XML Character-base XML 1 <?xml version ="1.0" encoding="utf -8"?> 18
19 2 <sample sampleid="oc01_00001" type="chiebukuro" version="1.0"> 3 <OCQuestion > 4 <webline > 5 <sentence > </ sentence > 6 </webline > 7 <br type=" logicalline_original" /> 8 <webline > 9 <sentence > </ sentence > 10 </webline > 11 <br type=" logicalline_original" /> 12 <br type=" logicalline_original" /> 13 <webline > 14 <sentence > </ sentence > 15 </webline > 16 <br type=" logicalline_original" /> 17 <br type=" logicalline_original" /> 18 <webline > 19 <sentence type="quasi"> </ sentence > 20 </webline > 21 </OCQuestion > 22 <OCAnswer > 23 <webline > 24 <sentence > </ sentence > 25 </webline > 26 <br type=" logicalline_original" /> 27 <br type=" logicalline_original" /> 28 <rejectedblock type="figure" /> 29 </OCAnswer > 30 </sample > M-XML Morphology-base XML 1 <?xml version ="1.0" encoding="utf -8"?> 2 <mergedsample sampleid="oc01_00001" type="bccwj -MorphXML" version="1.0"> 3 <article articleid="oc01_ Question"> 4 <webline > 5 <sentence ><LUW B="S" SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos=" - - " l_formbase=" " > < SUW orderid ="10" lemmaid ="24777" lemma=" " lform=" " wtype=" " pos=" - " ctype=" - " cform=" - " formbase=" " orthbase=" " kana=" " pron=" " start="10" end="30"> </ SUW ><SUW orderid ="20" lemmaid ="16629" lemma=" " lform=" " wtype=" " pos=" - - " formbase=" " pron=" " start ="30" end="50"> </ SUW ></LUW ><LUW SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos=" - " l_formbase=" " > < SUW orderid ="30" lemmaid ="28989" lemma=" " lform=" " wtype=" " pos=" - " formbase=" " pron=" " start="50" end="60"> </ SUW ></LUW ><LUW B="B" SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos=" - - " l_formbase=" " > < SUW orderid ="40" lemmaid ="34867" lemma=" " lform=" " wtype=" " pos=" - - " formbase=" " pron=" " start ="60" end="70"> </ SUW ></LUW ><LUW SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos=" - " l_formbase=" " > < SUW orderid ="50" lemmaid ="41407" lemma=" " lform=" " wtype=" " pos=" - " formbase=" " pron=" " start ="70" end="80"> </ SUW ></LUW ><LUW B="B" SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos=" - " l_ctype=" - " l_cform=" - " l_formbase=" " l_orthbase=" " > < SUW orderid ="60" lemmaid ="5951" lemma=" " lform=" " wtype=" " pos=" - " ctype=" - " cform=" - " formbase=" " orthbase=" " kana=" " pron=" " start ="80" end="100"> </ SUW ></LUW >< LUW SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos=" - " l_formbase=" " > < SUW orderid ="70" lemmaid ="24874" lemma=" " lform=" " wtype=" " pos=" - " formbase=" " pron=" " start ="100" end="110"> </ SUW ></LUW ><LUW B="B" SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos =" - " l_ctype=" " l_cform=" - " l_formbase=" " l_orthbase=" " > < SUW orderid ="80" lemmaid ="10518" lemma=" " lform=" " wtype=" " pos=" - " ctype=" " cform =" - " formbase=" " orthbase=" " kana=" " pron=" " start ="110" end="120"> </ SUW ></LUW ><LUW SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos=" " l_ctype=" - " l_cform=" - " l_formbase=" " l_orthbase=" " > < SUW orderid ="90" lemmaid ="35697" lemma=" " lform=" " wtype=" " pos=" " ctype=" - " cform=" - " formbase=" " orthbase=" " kana=" " pron=" " start ="120" end="140"> </ SUW ></LUW ><LUW SL="v" l_lemma=" " l_lform=" " l_wtype=" " l_pos=" " l_ctype=" - " l_cform=" - " l_formbase=" " l_orthbase=" " > < SUW orderid ="100" lemmaid ="21642" lemma=" " lform=" " wtype=" " pos=" " ctype=" - " cform =" - " formbase=" " orthbase=" " pron=" " start ="140" end="150"> </ SUW ></LUW ><LUW SL="v" l_lemma=" " l_lform="" l_wtype=" " l_pos=" - " > < SUW orderid ="110" lemmaid ="25" lemma=" " lform="" wtype=" " pos=" - " formbase="" pron="" start ="150" end="160"> </ SUW ></LUW ></ sentence > 6 </webline > 19
20 7 <br type=" logicalline_original" /> LUW ( ) 1 OC OC01_ B B 2 OC OC01_ I 3 OC OC01_ B I 4 OC OC01_ I 5 OC OC01_ B I 6 OC OC01_ I 7 OC OC01_ B I 8 OC OC01_ I 9 OC OC01_ I 10 OC OC01_ I LUW ( ) 1 OC OC01_ B OC OC01_ I OC OC01_ I OC OC01_ I OC OC01_ I OC OC01_ I OC OC01_ I OC OC01_ I OC OC01_ I OC OC01_ I OC OC01_ I (2013), [ ](2013) 1 ( 4 ),. (2006). (2011)
21 ( ) [,, ] 3.2 [, ] [] [ ] [, ] () ( ) [, ]? 0 1? [, ] * 0 * [,,,...] [,,,...] 1 wagahaiwa_nekodearu.txt [] [] [ ] [,, ] [ˆ] [ˆ ] [,, ] [x-y] [A-C] [ A, B, C]
22 Shift JIS UTF-8 (\r), (\v), (\f) \n \t \n \t ( ) (,) \W, \S, \D \w, \s, \d ( ) \w, \s, \d. 1. [,,,...] \w ([0-9A-Za-z_]) \w [,,,...] \s ([ \n\t\r\v\f]) \d 10 ([0-9]) \d [ 1, 2, 3,...] ( ) \p{ll} \p{lowercase_letter} \p{lu} \p{uppercase_letter} \p{lo} \p{other_letter} \p{l} \p{letter} \p{zs} \p{space_separator} 22
23 \p{sm} \p{math_symbol} +,,,, \p{sc} \p{currency_symbol} \p{pd} \p{dash_punctuation} \p{ps} \p{open_punctuation} \p{pe} \p{close_punctuation} \p{pi} \p{initial_punctuation} ( ) \p{pf} \p{final_punctuation} ( ) \p{pc} \p{connector_punctuation} \p{pc} \p{othero_punctuation}!, &,, :, \p{cc} \p{control} \p{hira} \p{hiragana} \p{kana} \p{katakana} \p{han} \p{ˆ...}, \P{...} (NOT) (OR) (AND) && [ˆx] x () [ˆ ] [xy] x y( ) [\p{hira}\p{kana}] [x&&y] x y( ) [\p{han}&&[ˆ ]] 3.4? 0 1? [, ] * 0 * [,,,...] [,,,...] {n,m} n m {2,4} [,, ] {n,} n {2,} [,,,...] {n} n {2} [ ] 8 3? + 23
24 ????+ * *? *+ + +? ++ {n,m} {n,m}? {n,m}+ {n,}???+ + ( ) +? ( ) *? * 3.5 ˆ ˆ $ $ \b \bcat category cat \B \Bcat cat ( ) \1 \1, \2, \3,... ( ( )\1 ( ( ))\1\2 24
25 \1, \2, \3,... \0 [ ] \ \1, \2, \3,... () ( ( )) \1 (?:) (?: ( )) \1 (?=) (?= ) (?!) (?! ) (?<=) (?<= ) (?<!) (?<= ) (?<name>) \k<name> (?<neko>[ ]) \k<neko> [, ] \1 \k<name>\g<name> ( ) ( ) \g<name> (?<neko>[ ]) \g<neko> [,,, ] 25
26 Chomsky (1956) n n+1 0 ψ ϕ 1 αaβ αϕβ () 2 A ϕ () 3 A ϕ 2 ( ) A ab A Ba 3 S 1 ab S 1 a + S 2 S 2 b a S 2 b S 1 a S 1 a*b S 1 a + S 1 S 1 b a S 1 b 26
27 S 1 a+b S 1 a + S 2 S 2 a + S 2 S 2 b a S 2 a S 2 b 14 a?b Chomsky, Noam (1956). Three models for the description of language. IRE Transactions on Information Theory(2): Friedl, E. F. Jeffrey (2006). Mastering Regular Expressions 3rd Edition. Oreilly & Associates Inc. ( [ ], 2008, 3.) 27
28 ( ) JUMAN JUMAN ChaSen NAIST( ) IPADIC MeCab NTT ( NAIST ) IPADIC JUMAN JUMAN MeCab IPADIC ChaSen MeCab (IPA) IPA (THiMCO97) UniDic (NINJAL) ChaSen MeCab JUMAN IPADIC UniDic JUMAN - - ChaSen - MeCab 4.3 (wagahaiwa_nekodearu.txt) Shift JIS wagahaiwa_nekodearu.txt Shift JIS Windows MeCab (IPADIC) Shift JIS 28
29 3. a b c : ˆ( ) $( ) \n( ) JIS X 0213 U n i c o d e U R L ( 16 ) 4.4 MeCab MeCab MeCab MeCab 29
30 4.4.1 PC MeCab Windows ( mecab exe) MeCab *6 *7 > C:\ Program Files ( x86)\ MeCab\ bin\ mecab. exe mecab C:\Program Files (x86)\mecab\bin\mecab C:\Program Files (x86)\mecab\bin\ mecab Windows mecab 1. (Windows 7 ) Path ;C:\Program Files (x86)\mecab\bin mecab.exe cd Windows waganeko Windows waganeko cmd 12 Z:\ *6 cmd *7 mecab.exe mecab CTRL+C 30
31 cd z:\desktop\waganeko ( ) ( ) mecab waganeko_sjis.txt waganeko_sjis_ipadic.txt > mecab waganeko_sjis. txt > waganeko_sjis_ipadic. txt MeCab cmd MeCab PC launch.bat waganeko 1 set PATH=% PATH%;C:\ Program Files ( x86)\ MeCab\ bin 2 cmd \k "cd Z:\ Desktop\ waganeko" mecab waganeko_sjis.txt > waganeko_sjis_ipadic.txt 4.5 MeCab IPADIC UniDic Windows UniDic MeCab UniDic Windows MeCab UniDic UTF-8 UTF-8 ( UTF-8 UTF-8 ) 5 wagahaiwa_nekodearu.txt waganeko_utf8_unidic.mecab 6 waganeko_utf8_unidic.mecab waganeko_sjis_ipadic.mecab 31
32 4.6 MeCab Excel waganeko_utf8_unidic.mecab Excel 32
33 5 5.1 NINJAL-LWP NINJAL-LWP BCCWJ *8 Lago BCCWJ UniDic NLB NLT MeCab+IPADIC BCCWJ (NLB) (NLT) (TWC) ( ) 11 (BCCWJ 1 11 () 100 ) Frequency MI (Mutual information) 1 LD LogDice ( 1 ) X + Y 2 X Y X Y X Y 2 X + Y NINJAL-LWP LogDice SketchEngine logdice = 14 + log 2 2 X Y X + Y X Y = 1, 213 ( ) *8 BCCWJ NINJAL-LWP Lago NLB NLB BCCWJ UniDic NLB NLT MeCab+IPADIC 33
34 X = 6, 007 ( ) Y = 31, 037 ( ) 2 1, 213 Dice = 6, , 037 = , 213 logdice = 14 + log 2 6, , 371 = 14 + log = 14 + ( 3.93) = Dice logdice 2 14 Dice logdice (= log 2 ) NINJAL-LWP MI SketchEngine MI = log 2 f AB N f A f B f A f B f AB N 11 f A = 6, 007 f B = 31, 037 f AB = 1, 213 N = 11 N p A = f A N p B = f B N p AB = f AB N ( ) p e AB = p A p B = f A f B N 2 34
35 f e AB = N pe AB = f A f B N 7 f AB f e AB MI = f AB N f A f B = = 7, f AB f AB N MI = log 2 fab e = log 2 = 12.8 f A f B MI 2 0 MI MI ( / ) MI LD 5.2 ( ) BCCWJ 5.3 ( ) BCCWJ 5.4 ChaKi NAIST ( ) mecab cabocha & 35
36 6 6.1 Ruby MSXML (B) XML XML Ruby XML 1 <?xml version ="1.0" encoding="utf -8"?> 2 <?xml -model href=" kyoto_corpus.rnc" type="application/relax -ng-compact -syntax"?> 3 <?xml -stylesheet type="text/xsl" href="kyoto_corpus.xsl"?> 4 <document id="950101" > 5 <article id=" " > 6 <sentence id=" " info="knp:96/10/27 MOD:2005/03/08" > 7 <chunk id="0" link="26" rel="d"> 8 <tag id="0" link="1" rel="d"> 9 <tok id="0" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 10 <tok id="1" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 11 </tag > 12 <tag id="1" link="37" rel="d"> 13 <rel type="=" target=" " sid=" " tag="0"/> 14 <tok id="2" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 15 <tok id="3" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 16 </tag > 17 </chunk > 18 <chunk id="1" link="2" rel="d"> 19 <tag id="2" link="3" rel="d"> 20 <tok id="4" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 21 <tok id="5" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 22 </tag > 23 </chunk > 24 <chunk id="2" link="6" rel="d"> 25 <tag id="3" link="10" rel="d"> 26 <rel type=" " target=" " sid=" " tag="2"/> 27 <rel type=" " target=" : "/ > 28 <tok id="6" read=" " base=" " pos=" " ctype=" " cform=" " > </ tok > 29 </tag > 30 </chunk > 36
37 6.4 NP 1 NP 2 NP 1 NP 2 1 <article id=" " > 2 <sentence id=" " info="knp:96/10/27 MOD:2002/04/22" > 3 <chunk id="0" link="1" rel="d"> 4 <tag id="0" link="1" rel="d"> 5 <tok id="0" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 6 <tok id="1" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 7 <tok id="2" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 8 </tag > 9 </chunk > 10 <chunk id="1" link="2" rel="d"> 11 <tag id="1" link="2" rel="d" subj="true" subj_cat="noun"> 12 <rel type=" " target=" " sid=" " tag="0"/> 13 <tok id="3" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 14 <tok id="4" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 15 </tag > 16 </chunk > 17 <chunk id="2" link="-1" rel="d"> 18 <tag id="2" link="-1" rel="d" copula="omitted" pred_cat="noun" pred_range ="5:5"> 19 <rel type=" " target=" " sid=" " tag="1" subj_cat="noun" subj_range ="3:3"/> 20 <memo >CO</memo > 21 <tok id="5" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 22 <tok id="6" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 23 </tag > 24 </chunk > 25 </sentence > <tag> 1. (etc.) 2. (<memo>co</memo>) ( ) copula pred_range pred_cat obvious omitted ( or ) () (tok ID) () (noun adv adj verb other) <sentence id=" " info="knp:96/11/01 MOD:2003/12/29" > 37
38 2 <chunk id="0" link="1" rel="d"> 3 <tag id="0" link="1" rel="d" subj="true" subj_cat="noun"> 4 <rel type=" " target=" " sid=" " tag="8"/> 5 <tok id="0" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 6 <tok id="1" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 7 </tag > 8 </chunk > 9 <chunk id="1" link="-1" rel="d"> 10 <tag id="1" link="-1" rel="d" copula="omitted" pred_cat="noun" pred_range ="2:2"> 11 <memo >CO</memo > 12 <rel type=" " target=" " sid=" " tag="0" subj_cat="noun" subj_range ="0:0"/> 13 <rel type=" " target=" " sid=" " tag="7"/> 14 <tok id="2" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 15 <tok id="3" read=" " base=" " pos=" - " ctype="*" cform="*"> </ tok > 16 </tag > 17 </chunk > 18 </sentence > (<tag>) (<rel>) subj subj_range subj_cat true ( ) () (tok ID) () (noun adv adj verb other) SUMO(Suggested Upper Merged Ontology) Process ContentBearingProcess ContentBearingPhysical ContentBearingObject Physical Artifact Substance Object Region CorpuscularObject OrganicObject AnatomicalStructure Entity SelfConnectedObject Agent Organism Human Attribute Collection Group Organization Abstract Relation SocialRole Proposition Quantity TimeMeasure 38
39 a i. - Human b i. WordNet ii. WordNet SUMO iii. SUMO 2. a / WordNet SUMO WordNet WordNet SUMO WordNet word 猫 sense synset ネコ sense synlink feline xlink にゃんにゃん sense sense true_cat synlink synlink house_cat xlink SUMO cat wildcat xlink Feline xlink (synset) (synlink) / 39
40 Hype Hypernym Hypo Hyponym Mprt Meronyms Part ( ) Hprt Holonyms Part ( ) WordNet SQLite XML SQLite Ruby SQLite sqlite3-ruby ActiveRecord WordNet (word) (synset) ( ) WordNet (sense) (freq) WordNet 13 WordNet (Bond et al., 2009) SUMO SUMO (Merge.kif) (Economy.kif, Food.kif, Human.kif,...) (MILO.kif) SUO-KIF(Standard Upper Ontology - Knowledge Interchange Format) 1 (subclass Physical Entity) 2 (partition Physical Object Process) 40
41 3 (documentation Physical EnglishLanguage "An entity that has a location in space -time. 4 Note that locations are themselves understood to have a location in 5 space -time.") 6 7 (<=> 8 (instance?phys Physical) 9 (exists (?LOC?TIME) 10 (and 11 (located?phys?loc) 12 (time?phys?time)))) ( ) SQLite WordNet Ruby Cohen κ Cohen κ Pr(a) Pr(e) κ = 1 Pr(e) Pr(a) 2 Pr(e) B Yes No Total A Yes No Total A B Yes 20 No Pr(a) = 35/50 = 0.7 A Yes 25 A Yes 25/50 = 0.5 B Yes 30 B Yes 30/50 = 0.6 A B Yes = 0.3 A B No 0.2 A B Pr(e) = = 0.5 κ κ = = 0.4 κ 2 1 ( ) 0 ( ) () 41
42 14 κ (N 1 N 2 ) (N 1 N 2 ) 2 7 is-a value-of equal-to property-of attribute-of correspond-to participant-of WordNet SUMO κ (N 1 N 2 ) (N 1 N 2 ) (taxonomic) (non-taxonomic) (cleft) % 42
43 6.7 vs variant( ) value( ) a b ( ) 2. a b 3. a (= be(n 1, N 2 )) b (= (, 53 )) 4. a ( ) (1997) 3, pp (1975) 103:1 17. (2011) : 1: (1992) I. (2003). (1996)., 1992,,. [ ], [ ], 2012,,. Bond, F., H. Isahara, S. Fujita, K. Uchimoto, T. Kuribayashi and K. Kanzaki Enhancing the Japanese WordNet. In The 7th Workshop on Asian Language Resources, in conjunction with ACL-IJCNLP Niles, I., and A. Pease Towards a Standard Upper Ontology. In Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Chris Welty and Barry Smith, eds, Ogunquit, Maine, October 17-19,
44 Pustejovsky, J., A. Rumhisky, J. L. Moszkowicz, and O. Batiukova GLML: annotating argument selection and coercion, In IWCS-8 09 Proceedings of the Eighth International Conference on Computational Semantics,
IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp
1. 1 1 1 2 treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corpus Management Tool: ChaKi Yuji Matsumoto, 1 Masayuki Asahara, 1 Masakazu Iwatate 1 and Toshio Morita 2 This paper
More information¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ
2013 8 18 Table of Contents = + 1. 2. 3. 4. 5. etc. 1. ( + + ( )) 2. :,,,,,, (MUC 1 ) 3. 4. (subj: person, i-obj: org. ) 1 Message Understanding Conference ( ) UGC 2 ( ) : : 2 User-Generated Content [
More informationUniDic version
UniDic version 1.3.9 2008 7 UniDic version 1.3.9 Users Manual Yasuharu Den, Atsushi Yamada, Hideki Ogura, Hanae Koiso, and Toshinobu Ogiso Copyright c 2007 2008 The UniDic consortium. All rights reserved.
More informationA Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹
A Japanese Word Dependency Corpus 2015 3 18 Special thanks to NTT CS, 1 /27 Bunsetsu? What is it? ( ) Cf. CoNLL Multilingual Dependency Parsing [Buchholz+ 2006] (, Penn Treebank [Marcus 93]) 2 /27 1. 2.
More informationMicrosoft Word - 07kondo.docx
1 2 6 1873 7 18748 18751 43 16 155 3 4 5 2005 XML 2005 1 kondo@ninjal.ac.jp 2 mtanaka@ninjal.ac.jp 3 1999-2009 1998 4 2004 2004 5 6 43 (1) (2) (3)(4) article span JIS X 0213 (1)(2)UCS (3)CJK g (1)(2)(3)
More information<mergedsample sampleid=" サンプル ID" type="bccwj MorphXML" version="1.1" NumTrans="true"> M-XML_NT のファイルであっても 対象となる数字列が存在せず NumTrans 処理がなされていないものについてはこの属
第 9 章形態論情報付き統合形式 XML(M-XML) 小木曽智信間淵洋子前川喜久雄 9.1 M-XML の概要形態論情報付き統合形式 XML(Morphology-base XML 以下 M-XML と略記する ) は 文字ベースの XML(C-XML) フォーマットをもとにして 固定長 可変長サンプルを統合し 言語構造を一定程度反映させた XML フォーマットである 短単位 長単位の形態論情報を
More informationCorrected Version NICT /11/15, 1 Thursday, May 7,
Corrected Version NICT 26 2008/11/15, 1 1 Word Sketch Engine (Kilgarriff & Tugwell 01; Srdanovic, et al. 08) 2 2 3 3 ( ) I-Language Grammar is Grammar and Usage is Usage (Newmeyer 03) 4 4 (is-a ) ( ) (
More information1 1.1 PC PC PC PC PC workstation PC hardsoft PC PC CPU 1 Gustavb, Wikimedia Commons.
1 PC PC 1 PC PC 1 PC PC PC PC 1 1 1 1.1 PC PC PC PC PC workstation PC 1.1.1 hardsoft 1.1.2 PC PC 1.1 1 1. 2. 3. CPU 1 Gustavb, Wikimedia Commons.http://en.wikipedia.org/wiki/Image:Personal_computer,_exploded_5.svg
More information( )
NAIST-IS-MT1051071 2012 3 16 ( ) Pustejovsky 2 2,,,,,,, NAIST-IS- MT1051071, 2012 3 16. i Automatic Acquisition of Qualia Structure of Generative Lexicon in Japanese Using Learning to Rank Takahiro Tsuneyoshi
More information2016
2016 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
More informationels08ws-kuroda-slides.key
NICT 26 2008/11/15, Word Sketch Engine (Kilgarriff & Tugwell 01; Srdanovic, et al. 08) ( ) I-Language Grammar is Grammar and Usage is Usage (Newmeyer 03) (is-a ) ( )?? () // () ()???? ? ( )?? ( ) Web ??
More information2015 9
JAIST Reposi https://dspace.j Title ウェブページからのサイト情報 作成者情報の抽出 Author(s) 堀, 達也 Citation Issue Date 2015-09 Type Thesis or Dissertation Text version author URL http://hdl.handle.net/10119/12932 Rights Description
More informationcsj-report.pdf
527 9 CSJ CSJ CSJ 1 8 XML CSJ XML Browser (MonoForC) CSJ 1.7 CSJ CSJ CSJ 9.1 GREP GREP Unix Windows Windows (http://www.vector.co.jp/) Trn Windows Trn > > grep *.trn 528 9 CSJ A01F0132.trn:& A01M0097.trn:&
More information11/27/2003 ( ) 1 UC Berkely FrameNet (FN) ( Frame Semantics (FS) Lexical Unit (LU) Commercial Transaction Fram
11/27/2003 ( ) 1 UC Berkely FrameNet (FN) (http://www.icsi.berkeley.edu/~framenet/) Frame Semantics (FS) Lexical Unit (LU) Commercial Transaction Frame Japanese FrameNet (JFN) FS 2 フレームネットとは何か 狭義にはフレーム意味論(後述)に基づく電子辞書
More informationbook
Bibliotheca21 Personal 3020-7-245-30 P-26D3-A114 Bibliotheca21 Personal 01-30 OS Windows 2000 Windows Server(R) 2003 Windows XP Windows Server(R) 2008 Windows Vista(R) Windows 7 Adobe Adobe Systems Incorporated
More information3 4 26 1980 1 WWW 26! 3, ii 4 7!! 4 2010 8 1. 1.1... 1 1.2... 2 1.3... 3 1.4... 7 1.5... 9... 9 2. 2.1... 10 2.2... 13 2.3... 16 2.4... 18... 21 3. 3.1... 22 3.2... 24 3.3... 33... 38 iv 4. 4.1... 39 4.2...
More informationXML XML (Extensible Markup Language) ISO SGML (Standard Generalized Markup Language) W3C (World Wide Web Consortium) XML 1.0
XML 2-1 XML XML (Extensible Markup Language) ISO SGML (Standard Generalized Markup Language) W3C (World Wide Web Consortium) XML 1.0 http://www.w3.org/tr/rec-xml http://www.fxis.co.jp/xmlcafe/tmp/rec-xml.html
More informationuntitled
580 26 5 SP-G 2011 AI An Automatic Question Generation Method for a Local Councilor Search System Yasutomo KIMURA Hideyuki SHIBUKI Keiichi TAKAMARU Hokuto Ototake Tetsuro KOBAYASHI Tatsunori MORI Otaru
More informationuntitled
Version 1.1... 2 X-JToBI... 2... 3... 3... 3 BI... 3... 3... 4... 5... 5... 5... 6... 7... 7... 8... 9... 9 BI... 9... 9... 10... 12... 15 BI... 17... 17... 18... 20... 21 *... 21 p.1 CSJ Venditti (1997)
More information20 Covert Channel
20 Covert Channel 200602824 1 4 2 6 2.1 Covert Channel..................... 6 2.1.1................. 6 2.1.2 Covert Channel........ 7 2.2...................... 7 2.3.................... 8 2.4..................
More informationNEEDS Yahoo! Finance Yahoo! NEEDS MT EDINET XBRL Magnetic Tape NEEDS MT Mac OS X Server, Linux, Windows Operating System: OS MySQL Web Apache MySQL PHP Web ODBC MT Web ODBC LAMP ODBC NEEDS MT PHP: Hypertext
More information(2008) JUMAN *1 (, 2000) google MeCab *2 KH coder TinyTextMiner KNP(, 2000) google cabocha(, 2001) JUMAN MeCab *1 *2 h
The Society for Economic Studies The University of Kitakyushu Working Paper Series No. 2011-12 (accepted in March 30, 2012) () (2009b) 19 (2003) 1980 PC 1990 (, 2009) (2001) (2004) KH coder (2009) TinyTextMiner
More informationcorpus.indd
22 JC-D-10-02 23 2 c 2011 21 1 I BCCWJ 3 1 BCCWJ 5 1.1 BCCWJ 3..................... 5 1.2 BCCWJ 2...................... 6 2 3 SC 7 2.1 SC SC............. 7 2.1.1 SC SC................... 7 2.1.2......................
More information2 2.1 NPCMJ ( (Santorini, 2010) (NPCMJ, 2016) (1) (, 2016) (1) (2) (1) ( (IP-MAT (CONJ ) (PP (NP (D ) (N )) (P )) (NP-SBJ *
Emacs Emacs : Emacs 1 Emacs Emacs ( ) (NPCMJ ) 1 Emacs NPCMJ 2 1 2 2.1 NPCMJ (http://npcmj.ninjal.ac.jp/) (Santorini, 2010) (NPCMJ, 2016) (1) (, 2016) (1) (2) (1) ( (IP-MAT (CONJ ) (PP (NP (D ) (N )) (P
More information( ) ID -S (Conflict Of Interest, COI) a-d COI a. b ( ) ( ) c d
2017 Abstract 2017 9 5 6 ( ) 2017 2017 9 7 ( ) 2017 9 5 6 ( ) 2017 https://goo.gl/45fhez ID -S 10 1 9 5 9 6 (Conflict Of Interest, COI) a-d COI a. b. 2014 10 ( ) ( ) c. 2014 10 d. 2014 10 2 Programme:
More informationa n a n ( ) (1) a m a n = a m+n (2) (a m ) n = a mn (3) (ab) n = a n b n (4) a m a n = a m n ( m > n ) m n 4 ( ) 552
3 3.0 a n a n ( ) () a m a n = a m+n () (a m ) n = a mn (3) (ab) n = a n b n (4) a m a n = a m n ( m > n ) m n 4 ( ) 55 3. (n ) a n n a n a n 3 4 = 8 8 3 ( 3) 4 = 8 3 8 ( ) ( ) 3 = 8 8 ( ) 3 n n 4 n n
More information情報の構造とデータ処理
mizutani@ic.daito.ac.jp 2014 SQL information system input process output (information) (symbols) (information structure) (data) 201411 ton/kg m/feet km 2 /m 2 (data structure) (integer) (real) (boolean)
More informationComputational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego
Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate category preservation 1 / 13 analogy by vector space Figure
More information自然言語処理21_249
1,327 Annotation of Focus for Negation in Japanese Text Suguru Matsuyoshi This paper proposes an annotation scheme for the focus of negation in Japanese text. Negation has a scope, and its focus falls
More information先行研究 pp
N N 1 BCCWJ 1 はじめに 2007 362 a a. b. a. b. a b 2007 363 A B A B A B A A B A B 1 2014 2 5 53 54 2007 363 2 先行研究 200719771989 1998 2001 1993 2004 1977 pp.122 130 N N 55 1989 2 pp.20 21 3 pp.34 35 2 3 56 1998
More informationohp.mgp
2019/06/11 A/B -- HTML/WWW(World Wide Web -- (TA:, [ 1 ] !!? Web Page http://edu-gw2.math.cst.nihon-u.ac.jp/~kurino VNC Server Address : 10.9.209.159 Password : vnc-2019 (2019/06/04 : : * * / / : (cf.
More information2006 3
JAIST Reposi https://dspace.j Title 質問の曖昧性を考慮した質問応答システムに関する研 究 Author(s) 松本, 匡史 Citation Issue Date 2006-03 Type Thesis or Dissertation Text version author URL http://hdl.handle.net/10119/1986 Rights Description
More information一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGIN
一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Technical Report SP2019-12(2019-08)
More informationcsj-report.pdf
133 3 CSJ CSJ 3.1 3.2 3.3 1 CSJ 100 650 CSJ 3.1.5 CSJ 4 CSJ 3 4 CSJ 3 4 134 3 3.1 3.1.1 3.1.1.1 () *1 3.1 *2 3.1 (1) 1 1 *1 ( 1987:11) *2 (1982:582-583) (1998:171-172) 3.1 135 (2) () 1 1 1 1 1 1 (1) ()
More information¥ƥ¥¹¥ȥ¨¥ǥ£¥¿¤λȤ¤˽
: 2010 2 14 1 MS Word.doc (MS Word 2003 ).docx (MS Word 2007 ) Word Windows.txt MS Word Word Word Word Excel Word 1 Word Word Word MS Word MS Word MS Word Word Windows MS Word MS Word Word Windows.txt
More informationコーパスに基づく言語学教育研究報告 8
No.82012 5 5 1. 2. 2009 BCCWJ 2007 1 12 BCCWJ Yahoo! BCCWJ 57,807 4,459 5,110 854 1,500 Yahoo! 45,725 159 57,807 2009 3. 1 2 3 X A 3 20102011 1 2 X A 1 X X X X X A 2 1 X A 3 1 1 2 3 2 http://mainet.ath.cx/bbs/sst/sst.php?act=dump&cate=hxh&all=2035&n=2
More information1 1 tf-idf tf-idf i
14 A Method of Article Retrieval Utilizing Characteristics in Newspaper Articles 1055104 2003 1 31 1 1 tf-idf tf-idf i Abstract A Method of Article Retrieval Utilizing Characteristics in Newspaper Articles
More information1 1 2 2 3 3 3.1 RSS Dripper [1]............................................ 3 3.2 Whazzup [2].............................................. 3 3.3 Summ
2011 08H046 1 1 2 2 3 3 3.1 RSS Dripper [1]............................................ 3 3.2 Whazzup [2].............................................. 3 3.3 Summify [3]..............................................
More information文字コードとその実装
1 2001 11 3 1 2 2 2 2.1 ISO/IEC 646 IRV US-ASCII................................. 2 2.2 ISO/IEC 8859 JIS X 0201..................................... 4 2.3 ISO/IEC 2022............................... 6
More informationVol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m
Vol.55 No.1 2 15 (Jan. 2014) 1,a) 2,3,b) 4,3,c) 3,d) 2013 3 18, 2013 10 9 saccess 1 1 saccess saccess Design and Implementation of an Online Tool for Database Education Hiroyuki Nagataki 1,a) Yoshiaki
More informationuntitled
623 2009 Wikipedia Learning a Large Scale of Ontology from Japanese Wikipedia Susumu Tamagawa Shinya Sakurai Takuya Tejima Takeshi Morita Noriaki Izumi Keio University s tamagawa@ae.keio.ac.jp National
More information<Documents Title Here>
Oracle9i Database R9.2.0 for Windows Creation Date: Mar 06, 2003 Last Update: Mar 24, 2003 CD 1 A99346-01 Oracle9i Database Release 2 (9.2.0.1.0) for Microsoft Windows NT/2000/XP CD 1 of 3 2 A99347-01
More information2 : Open Clip Art Library [4] 2 3 4 5 6 2. 2 2. 1 Microsoft Office PowerPoint Web PowerPoint 2 Yahoo! Web [5] SlideShare 2. 1. 1 Yahoo! Web Yahoo! Web
DEWS2008 E4-4 606-8501 E-mail: {hsato,oyama,tanaka}@dl.kuis.kyoto-u.ac.jp.. Supporting the Selection of Images Based on Referential Semantics from Surrounding Information of the Image in Presentation Files
More informationIPSJ-TOD
Vol. 3 No. 2 91 101 (June 2010) 1 1 1 2 1 TSC2 Automatic Evaluation of Text Summaries by Using Paraphrase Kazuho Hirahara, 1 Hidetsugu Nanba, 1 Toshiyuki Takezawa 1 and Manabu Okumura 2 The evaluation
More informationkut-paper-template2.dvi
19 A Proposal of Text Classification using Formal Concept Analysis 1080418 2008 3 7 ( ) Hasse Web Reuters 21578 Concept Explorer 2 4 said i Abstract A Proposal of Text Classification using Formal Concept
More information1. はじめに 2
点予測と能動学習を用いた効率的なコーパス構築 形態素解析における実証実験 京都大学情報学研究科 Graham NEUBIG 1 1. はじめに 2 形態素解析 べた書きの文字列を意味のある単位に分割し 様々な情報を付与 品詞 基本形 読み 発音等を推定 農産物価格安定法を施行した 価格 / 名詞 / 価格 / かかく / かかく安定 / 名詞 / 安定 / あんてい / あんてー法 / 接尾辞 /
More informationB 20 Web
B 20 Web 0753018 21 1 29 1 1 6 2 8 3 UI 10 3.1........................ 10 3.2 Web............ 11 3.3......... 12 4 UI 14 4.1 Web....................... 15 4.2 Web........... 16 4.3 Web....................
More information() (MeCab) *1 Juman ChaSen *2 MeCab ChaSen 1.3 MeCab MeCab OS Windows MeCab [] [Binary package for MS-Windows] [] sourceforge.net [mecab-win32] Mac OS
RMeCab 2008 6 16 1 MeCab RMeCab 1 1.1.............................................. 1 1.2............................................ 1 1.3 MeCab......................................... 2 1.4 RMeCab..........................................
More information() (MeCab) *1 Juman ChaSen *2 MeCab ChaSen 1.3 MeCab MeCab OS Windows MeCab [] [Binary package for MS-Windows] [] sourceforge.net [mecab-win32] Mac OS
RMeCab 2008 11 8 1 MeCab RMeCab 1 1.1.............................................. 1 1.2............................................ 1 1.3 MeCab......................................... 2 1.4 RMeCab..........................................
More informationIPSJ SIG Technical Report Vol.2013-NL-214 No /11/15 1,a) (1) [ ] [ ] [14], [28] [17] 1 Tohoku University, Sendai, Miyagi 980 8
1,a) 2 2 3 4 5 3 1 1. (1) [ ] [ ] [14], [28] [17] 1 Tohoku University, Sendai, Miyagi 980 8579, Japan 2 Tokyo Institute of Technology 3 National Institute of Informatics 4 University of Yamanashi 5 Future
More informationVol. 19 No. 4 December 2012 level and replace them to the original category, and (2) cut not-is-a links between categories and category-to-articles. E
Wikipedia Wikipedia is-a Wikipedia Wikipedia is-a (1) Wikipedia (2) Wikipedia is-a (1) Wikipedia (2) Wikipedia not-is-a 3 Wikipedia is-a not-is-a 3 9 1 is-a is-a 95.3% 96.6% 96.2% 95.6% 84.5% 34,000 88.6%
More information<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML
DEWS2008 C6-4 XML 606-8501 E-mail: yyonei@db.soc.i.kyoto-u.ac.jp, {iwaihara,yoshikawa}@i.kyoto-u.ac.jp XML XML XML, Abstract Person Retrieval on XML Documents by Coreference that Uses Structural Features
More informationcorpus.indd
特定領域研究 日本語コーパス 平成 22 年度研究成果報告書 (JC-D-10-04) 現代日本語書き言葉均衡コーパス における電子化フォーマット ver.2.2 山口昌也高田智和北村雅則間淵洋子大島一小林正行西部みちる 平成 23 年 2 月 2011 文部科学省科学研究費特定領域研究 代表性を有する大規模日本語書き言葉コーパスの構築: 21 世紀の日本語研究の基盤整備 データ班 (Balanced
More information42 3 u = (37) MeV/c 2 (3.4) [1] u amu m p m n [1] m H [2] m p = (4) MeV/c 2 = (13) u m n = (4) MeV/c 2 =
3 3.1 3.1.1 kg m s J = kg m 2 s 2 MeV MeV [1] 1MeV=1 6 ev = 1.62 176 462 (63) 1 13 J (3.1) [1] 1MeV/c 2 =1.782 661 731 (7) 1 3 kg (3.2) c =1 MeV (atomic mass unit) 12 C u = 1 12 M(12 C) (3.3) 41 42 3 u
More information1 I EViews View Proc Freeze
EViews 2017 9 6 1 I EViews 4 1 5 2 10 3 13 4 16 4.1 View.......................................... 17 4.2 Proc.......................................... 22 4.3 Freeze & Name....................................
More informationVol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N
Vol. 42 No. 6 June 2001 IREX-NE F 83.86 A Japanese Named Entity Extraction System Based on Building a Large-scale and High-quality Dictionary and Pattern-matching Rules Yoshikazu Takemoto, Toshikazu Fukushima
More information橡dbweb2002-sato.PDF
Web Web 1 Web XML DB Web EAI 2 RDF RDF Schema DAML+OIL OWL (Web Ontology Language) 3 Resource Description Framework (RDF) W3C XML http://www.net.intap.or.jp/intap/s-web/
More information/* sansu1.c */ #include <stdio.h> main() { int a, b, c; /* a, b, c */ a = 200; b = 1300; /* a 200 */ /* b 200 */ c = a + b; /* a b c */ }
C 2: A Pedestrian Approach to the C Programming Language 2 2-1 2.1........................... 2-1 2.1.1.............................. 2-1 2.1.2......... 2-4 2.1.3..................................... 2-6
More informationNLC配布用.ppt
Semantic Web September 20, 200 IBM( ) (uramoto@jp.ibm.com) Semantic Web ( )? Semantic Web 2 What can it do? (by Jim Hendler) 3 Semantic Web W3C Director Berners-Lee Web The Semantic Web is an extension
More informationIPSJ SIG Technical Report Vol.2015-SE-187 No /3/ Checking the Consisteny between Requirements Specification Documents and Regulations A
1 1 1 Checking the Consisteny between Requirements Specification Documents and Regulations Abstract: When developers check the consistency between requirements specification documents and regulations by
More informationNINJAL Project Review Vol.3 No.3
NINJAL Project Review Vol.3 No.3 pp.107 116 March 2013 Learners Spoken Corpus of Japanese and Developmental Sequence of Verbs SAKODA Kumiko 1 C-JAS 2 2.1 1 1 8 13 3 OPI Oral Proficiency Interview 9 10
More informationuntitled
DICOM Digital Imaging and Communications in Medicine DICOM DICOM Digital Imaging and Communications in Medicine ACRNEMA CD-R DICOM 1 HIS Server PC HL7 RIS WEB Image/Report Viewer WEB RIS Server DICOM DICOM
More informationpp DC 2,
計量国語学 アーカイブ ID KK300501 種別 論文 A タイトル 近代二字漢語における同語異表記の実態と変化 形態論情報付きコーパスを用いて Title Diachronic Variation in Orthography of Two-Character Sino-Japanese Words in Modern Japanese: A Corpus-based Study 著者 間淵洋子
More informationJP1/Integrated Management - Service Support 操作ガイド
JP1 Version 9 JP1/Integrated Management - Service Support 3020-3-R92-10 P-242C-8F94 JP1/Integrated Management - Service Support 09-50 OS Windows Server 2008 Windows Server 2003 OS JP1/Integrated Management
More informationORCA (Online Research Control system Architecture)
ORCA (Online Research Control system Architecture) ORCA Editor Ver.1.2 1 9 10 ORCA EDITOR 10 10 10 Java 10 11 ORCA Editor Setup 11 ORCA Editor 12 15 15 ORCA EDITOR 16 16 16 16 17 17 ORCA EDITOR 18 ORCA
More information2 1 7 - TALK ABOUT 21 μ TALK ABOUT 21 Ag As Se 2. 2. 2. Ag As Se 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 Sb Ga Te 2. Sb 2. Ga 2. Te 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 1 2 3 4
More information数理言語
人工知能特論 II 第 5 回二宮崇 1 今日の講義の予定 CCG (COMBINATORY CATEGORIAL GRAMMAR) 組合せ範疇文法 2 講義内容 前回までの内容 pure CCG Bluebird 今回の内容 Thrush Starling 擬似的曖昧性 CCG のすごいところ 3 前回説明したCCG ``pure categorial grammar 関数適用規則 (functional
More informationJP1/Automatic Job Management System 2 for 活文PDFstaff Option
JP1 Version 8 JP1/Automatic Job Management System 2 for PDFstaff Option 3020-3-K44 OS JP1/Automatic Job Management System 2 for PDFstaff Option OSWindows 2000Windows XP ProfessionalWindows Server 2003
More informationfinalrep.dvi
18 Building a Knowledge Management System for Acquiring Wisdom of Crowds 1095701 2007 3 16 Blog Wiki Web Web Web i Abstract Building a Knowledge Management System for Acquiring Wisdom of Crowds Kazunori
More informationチュートリアル XP Embedded 入門編
TUT-0057 Ver. 1.0 www.interface.co.jp Ver 1.0 2005 6 (,), Web site () / () 2004 Interface Corporation. All rights reserved. ...1...1 1. XP Embedded...2 2....3 2.1....3 2.2....4 2.2.1. SLD...4 2.3....5
More informationModal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]
30 4 2016 3 pp.195-209. 2014 N=23 (S)AdvOV (S)OAdvV 2 N=17 (S)OAdvV 2014 3, 2008 Koizumi 1993 3 MP IP VP 1 MP 2006 2002 195 Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb
More informationindex.dvi
1 1 7 1.1 EXTRA for Windows Version 4............... 7 1.1.1 OS................................. 7 1.1.2 MSAA............................... 8 1.1.3........................... 8 1.2 EXTRA for Windows Version
More information2013 Future University Hakodate 2013 System Information Science Practice Group Report biblive : Project Name biblive : Recording and sharing experienc
2013 Future University Hakodate 2013 System Information Science Practice Group Report biblive : Project Name B biblive stream Group Name GroupB biblive stream /Project No. 12-B /Project Leader 1011063
More informationkut-paper-template.dvi
14 Application of Automatic Text Summarization for Question Answering System 1030260 2003 2 12 Prassie Posum Prassie Prassie i Abstract Application of Automatic Text Summarization for Question Answering
More information17 18 2
17 18 2 18 2 8 17 4 1 8 1 2 16 16 4 1 17 3 31 16 2 1 2 3 17 6 16 18 1 11 4 1 5 21 26 2 6 37 43 11 58 69 5 252 28 3 1 1 3 1 3 2 3 3 4 4 4 5 5 6 5 2 6 1 6 2 16 28 3 29 3 30 30 1 30 2 32 3 36 4 38 5 43 6
More information自然言語処理24_705
nwjc2vec: word2vec nwjc2vec nwjc2vec nwjc2vec 2 nwjc2vec 7 nwjc2vec word2vec nwjc2vec: Word Embedding Data Constructed from NINJAL Web Japanese Corpus Hiroyuki Shinnou, Masayuki Asahara, Kanako Komiya
More informationN-gram Language Models for Speech Recognition
N-gram Language Models for Speech Recognition Yasutaka SHINDOH ver.2011.01.22 1. 2. 3. 4. N-gram 5. N-gram0 6. N-gram 7. 2-gram vs. 3-gram vs. 4-gram 8. 9. (1) name twitter id @y_shindoh web site http://quruli.ivory.ne.jp/document/
More informationnull element [...] An element which, in some particular description, is posited as existing at a certain point in a structure even though there is no
null element [...] An element which, in some particular description, is posited as existing at a certain point in a structure even though there is no overt phonetic material present to represent it. Trask
More information_0212_68<5A66><4EBA><79D1>_<6821><4E86><FF08><30C8><30F3><30DC><306A><3057><FF09>.pdf
More information
レポート作成スキルUP.doc
( ) UP Contents... 3 UP... 4 1.1... 4 1.2... 5 1.3... 5 1.4... 6 1.5... 7 1.5.1... 7 1.5.2... 7 1.6... 8 1.7... 9 1.8... 9 1.8.1... 9 1.8.2... 10 1.8.3... 11 1.8.4... 11 1.9... 11 1.10... 12 1.11... 13
More informationuntitled
FutureNet Microsoft Corporation Microsoft Windows Windows 95 Windows 98 Windows NT4.0 Windows 2000, Windows XP, Microsoft Internet Exproler (1) (2) (3) COM. (4) (5) ii ... 1 1.1... 1 1.2... 3 1.3... 6...
More information…l…b…g…‘†[…N…v…“…O…›…~…fi…OfiÁŸ_
13 : Web : RDB (MySQL ) DB (memcached ) 1: MySQL ( ) 2: : /, 3: : Google, 1 / 23 testmysql.rb: mysql ruby testmem.rb: memcached ruby 2 / 23 ? Web / 3 ( ) Web s ( ) MySQL PostgreSQL SQLite MariaDB (MySQL
More information計量国語学 アーカイブ ID KK 種別 特集 招待論文 A タイトル Webコーパスの概念と種類, 利用価値 語史研究の情報源としてのWebコーパス Title The Concept, Types and Utility of Web Corpora: Web Corpora as
計量国語学 アーカイブ ID KK300601 種別 特集 招待論文 A タイトル Webコーパスの概念と種類, 利用価値 語史研究の情報源としてのWebコーパス Title The Concept, Types and Utility of Web Corpora: Web Corpora as a Source of Information for Etymological Studies 著者
More informationMacintosh HD:Users:ks91:Documents:lect:nm2002s:nm2002s03.dvi
3 ks91@sfc.wide.ad.jp April 22, 2002 1 2 1. over IP ( : Voice over IP; IP Internet Protocol ) over IP??? : 2002/4/20 23:59 JST : http://www.soi.wide.ad.jp/report/ 3 32 11 (4/22 ) 4 () 3 2 1? 4 ...... A.C.
More information2014_Apr_FSLP_A4
NPO FILEMAKER FileMaker Pro Advanced Version 13 April 2014 FileMaker ipad iphone Windows Mac Web 5 38,000 1 1 * Starter Solution Excel PDF Web CSV, Excel, XML, Bento, ODBC ODBC / JDBC ** SQL FileMaker
More information[1], B0TB2053, 20014 3 31. i
B0TB2053 20014 3 31 [1], B0TB2053, 20014 3 31. i 1 1 2 3 2.1........................ 3 2.2........................... 3 2.3............................. 4 2.3.1..................... 4 2.3.2....................
More information21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G
ol2013-nl-214 No6 1,a) 2,b) n-gram 1 M [1] (TG: Tree ubstitution Grammar) [2], [3] TG TG 1 2 a) ohno@ilabdoshishaacjp b) khatano@maildoshishaacjp [4], [5] [6] 2 Pitman-Yor 3 Pitman-Yor 1 21 Pitman-Yor
More informationMicrosoft PowerPoint - gijutsuenshu04_061024_2.ppt
情報技術演習 第 4 回 情報抽出と自然言語処理 2006/10/24 久保田秀和文学部 / 情報学研究科 kubota@ii.ist.i.kyoto-u.ac.jp http://www.ii.ist.i.kyoto-u.ac.jp/~kubota/ 本日の講義 演習 プログラミングの基礎 ( 復習 ) 前回提出されたレポートを題材に 計算機上の身近な情報へのアプローチ (CGUI) 情報抽出と自然言語処理
More information卒論 提出用ファイル.doc
11 13 1LT99097W (i) (ii) 0. 0....1 1....3 1.1....3 1.2....4 2....7 2.1....7 2.2....8 2.2.1....8 2.2.2....9 2.2.3.... 10 2.3.... 12 3.... 15 Appendix... 17 1.... 17 2.... 19 3.... 20... 22 (1) a. b. c.
More informationJournal04-03&04.PMD
Japan Translation Journal No.210 Japan Translation Federation Report 1 2 3 4 Honrenso No.86 5 No.87 6 Information JTF 8 10 PR 12 News 13 14 15 JTF 16 16 104-0032 2-8-1 3F TEL 03-3555-6365 FAX 03-3552-1784
More information情報化社会に関する全国調査中間報告書
9 1 1990 1998 25.2% 2000 38.6% 2001 50.1% 2002 3 57.2% 2001 12 60.5% 2002 3 49.5% 2001 12 44.0% 2002 1 1992 0 2 1993 1 2 1994 84 37 1995 467 283 1996 1411 1080 1997 1621 1057 1998 1700 1098 1999 3036 1666
More information(2 Linux Mozilla [ ] [ ] [ ] [ ] URL 2 qkc, nkc ~/.cshrc (emacs 2 set path=($path /usr/meiji/pub/linux/bin tcsh b
II 5 (1 2005 5 26 http://www.math.meiji.ac.jp/~mk/syori2-2005/ UNIX (Linux Linux 1 : 2005 http://www.math.meiji.ac.jp/~mk/syori2-2005/jouhousyori2-2005-00/node2. html ( (Linux 1 2 ( ( http://www.meiji.ac.jp/mind/tool/internet-license/
More informationVer.5.02 2 1 1 2. 1 3 1 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4 2 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5 3 5.1 5.1.1 5.1.2 5.1.3 5.2 5.2.1 5.2.2 1 5.2.3 6 4 6.1 6.1.1 6.1.2 6.1.3 6.2 6.3 6.3.1 6.3.2 6.4 6.4.1 6.4.2 7 13
More informationInstallation and New Features Guide for FileMaker Pro 10 and FileMaker Pro 10 Advanced
FileMaker FileMaker Pro 10 and FileMaker Pro 10 Advanced 2007-2009 FileMaker, Inc. All rights reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker Bento Bento FileMaker,
More informationDEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme
DEIM Forum 2009 C8-4 QA NTT 239 0847 1 1 E-mail: {kabutoya.yutaka,kawashima.harumi,fujimura.ko}@lab.ntt.co.jp QA QA QA 2 QA Abstract Questions Recommendation Based on Evolution Patterns of a QA Community
More informationp.14 p.14 p.17 1 p レッテル貼り文 2015: PC 20 p : PC 4
18 13 4 2017.10.1 キーワード 要 旨 1 はじめに 1 1988 K 2000 1-2 p.163 1 2000: 161 2 2000: 161 1 2 19 22015 p.14 p.14 p.17 1 p.17 1 2 2 3 2 レッテル貼り文 2015: 17 3 4 3 PC 20 p.40 2015: 21 28 3 PC 4 20 4 1 p.193 3 3 4 3
More information授受補助動詞の使用制限に与える敬語化の影響について : 「くださる」「いただく」を用いた感謝表現を中心に
Title 授受補助動詞の使用制限に与える敬語化の影響について : くださる いただく を用いた感謝表現を中心に Author(s) 山口, 真里子 Citation 国際広報メディア 観光学ジャーナル, 6, 69-89 Issue Date 2008-03-21 Doc URL http://hdl.handle.net/2115/34577 Type bulletin (article) File
More information122.pdf
HironobuUtsugi hironobu-utsugi@exa-corp.co.jp RDB exa review XML HTML W3C(World Wide Web Consortium) XML(Extensible Markup Language) HTML RDB(Relational Database) XML XML DB RDB XML DB XML DB XML * 1 RDB
More information1. [1, 2, 3] (PDF ) [4] API API [5] ( ) PDF Web Web Annotate[6] Digital Library for Earth System Education(DLESE)[7] Web PDF Text, Link, FreeTe
aoyama@info.suzuka-ct.ac.jp yamaji@nii.ac.jp Sharing system of annotation for paper publication Toshihiro AOYAMA Department of Electronic and Information Engineering, Suzuka National College of Technology
More information( 9 1 ) 1 2 1.1................................... 2 1.2................................................. 3 1.3............................................... 4 1.4...........................................
More information