¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

Similar documents
A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

E 2017 [ 03] (DAG; Directed Acyclic Graph) [ 13, Mori 14] DAG ( ) Mori [Mori 12] [McDonald 05] [Hamada 00] 2. Mori [Mori 12] Mori Mori Momouchi

x i 2 x x i i 1 i xi+ 1xi+ 2x i+ 3 健康児に本剤を接種し ( 窓幅 3 n-gram 長の上限 3 の場合 ) 文字 ( 種 )1-gram: -3/ 児 (K) -2/ に (H) -1/ 本 (K) 1/ 剤 (K) 2/ を (H) 3/ 接 (K) 文字 (

(2008) JUMAN *1 (, 2000) google MeCab *2 KH coder TinyTextMiner KNP(, 2000) google cabocha(, 2001) JUMAN MeCab *1 *2 h

Vol. 22 No. 2 June 2015 and language expressions. Based on these backgrounds, in this study, we discuss the definition of a tag set for recipe terms a

jpaper : 2017/4/17(17:52),,.,,,.,.,.,, Improvement in Domain Specific Word Segmentation by Symbol Grounding suzushi tomori, hirotaka kameko, takashi n

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

自然言語処理21_249

21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G

自然言語処理24_705

2015 9

1. はじめに 2

[1], B0TB2053, i

[4], [5] [6] [7] [7], [8] [9] 70 [3] 85 40% [10] Snowdon 50 [5] Kemper [3] 2.2 [11], [12], [13] [14] [15] [16]

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

一般社団法人 電子情報通信学会 THE INSTITUTE OF ELECTRONICS, 社団法人 電子情報通信学会 INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Technical Report NLC ( ) 信学

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

2006 3

main.dvi

N-gram Language Models for Speech Recognition

gengo.dvi

IPSJ-TOD

1 3 [1] [2, 3] WWW 2.1 WWW WWW DjVu 3 ( 1) 2 DjVu DjVu DjVu[2] 16 ( ) http

1 Twitter Twitter Twitter 2. 1 Xu [3] Twitter Twitter Twitter Twitter iphone iphone iphone Twitter Xu [3] Twitter Xu [5] Web Web Web Web

( : A9TB2096)

,,, Twitter,,, ( ), 2. [1],,, ( ),,.,, Sungho Jeon [2], Twitter 4 URL, SVM,, , , URL F., SVM,, 4 SVM, F,.,,,,, [3], 1 [2] Step Entered

27 YouTube YouTube UGC User Generated Content CDN Content Delivery Networks LRU Least Recently Used UGC YouTube CGM Consumer Generated Media CGM CGM U

untitled

JA-Kisyu_05.indd

untitled

<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML

”Лï‡Æ™²“¸_‚æ4“ƒ__‘dflÅPDF‘‚‡«‘o‡µ.pdf

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

まちno10.indd

(i) 1 (ii) ,, 第 5 回音声ドキュメント処理ワークショップ講演論文集 (2011 年 3 月 7 日 ) 1) 1 2) Lamel 2) Roy 3) 4) w 1 w 2 w n 2 2-g

第2章 近代日本の貧困観

™…{,

( : A8TB2163)

一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGIN

untitled


コーパスに基づく言語学教育研究報告 8

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +

untitled

[1] [2] [3] 1 GPS 1 Twitter *1 *1 GPS [4] [5] [6] 2 [7] 1 [8] Restricted Boltzmann Machine RBM RBM

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

IPSJ SIG Technical Report Vol.2013-NL-214 No /11/15 1,a) (1) [ ] [ ] [14], [28] [17] 1 Tohoku University, Sendai, Miyagi 980 8

HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus

- 4 -

CJL NEWS VOL JANUARY contents


fiúŒ{„ê…Z…fi…^†[…j…–†[…X

所報

untitled

[12] Qui [6][7] Google N-gram[11] Web ( 4travel 5, 6 ) ( 7 ) ( All About 8 ) (1) (2) (3) 3 3 (1) (2) (3) (a) ( (b) (c) (d) (e) (1

1 (1) vs. (2) (2) (a)(c) (a) (b) (c) 31 2 (a) (b) (c) LENCHAR

i

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

01.indd

プリズムh1-07

untitled

CONTENTS Vol.65 No.2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~

CONTENTS Vol.63 No.3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~ ~~~~~~~~



Vol.33 CONTENTS

untitled

日本消防3月H1-4.三校.indd

CONTENTS Vol.67 No.12

2016

/4 2012

Vol. 9 No. 5 Oct (?,?) A B C D 132

平成20年2月10日号

IPSJ SIG Technical Report Vol.2011-DBS-153 No /11/3 Wikipedia Wikipedia Wikipedia Extracting Difference Information from Multilingual Wiki

untitled

b n m, m m, b n 3

ts01

Contents

skeiji.final.dvi

!

( )

( )

2 3

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and


IPSJ SIG Technical Report Vol.2017-SLP-115 No /2/18 1,a) 1 1,2 Sakriani Sakti [1][2] [3][4] [5][6][7] [8] [9] 1 Nara Institute of Scie

johnny-paper2nd.dvi

_314I01BM浅谷2.indd

Fig. 2 Signal plane divided into cell of DWT Fig. 1 Schematic diagram for the monitoring system

研究紀要、研究発表会等一覧

IPSJ SIG Technical Report 1,a) 1,b) 1,c) 1,d) 2,e) 2,f) 2,g) 1. [1] [2] 2 [3] Osaka Prefecture University 1 1, Gakuencho, Naka, Sakai,

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

ARDJ-at-NLP24-slides.key

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

Twitter‡Ì”À‰µ…c…C†[…g‡ðŠŸŠp‡µ‡½…^…C…•…›…C…fi‘ã‡Ì…l…^…o…„‘îŁñ„�™m

Vol. 43 No. 7 July 2002 ATR-MATRIX,,, ATR ITL ATR-MATRIX ATR-MATRIX 90% ATR-MATRIX Development and Evaluation of ATR-MATRIX Speech Translation System

1 AND TFIDF Web DFIWF Wikipedia Web Web AND 5. Wikipedia AND 6. Wikipedia Web Ma [4] Ma URL AND Tian [8] Tian Tian Web Cimiano [3] [

Transcription:

2013 8 18

Table of Contents

= + 1. 2. 3. 4. 5. etc.

1. ( + + ( )) 2. :,,,,,, (MUC 1 ) 3. 4. (subj: person, i-obj: org. ) 1 Message Understanding Conference

( ) UGC 2 ( ) : : 2 User-Generated Content

[ 12] BCCWJ[ 09] 5 ( ) ( ) (F ) twitter 3,680 1,250 500 728 50 11 12 10 90 99.32 96.75 97.25 96.70 96.52 98.98 97.70 97.05 97.17

( ) :,,,,,, :,,,,,,,

[Momouchi 80] [Hamada 00] [ 07]

+ 1.[Neubig, Mori, et al. 11] KyTea (Cf., MeCab, JUMAN,...) 2. (F), (Q), (T), (D), (Sf), (St), (Ac), (Af)

( ) 3. [Flannery, Mori, et al.] EDA (Cf. CaboCha, KNP,...) 4. [Yoshino, Mori, et al.] Ac( : - F, : T ) 1

[ 12]

: BCCWJ 53,899 1,834,784 11,700 197,941 136,109 9,023 398,569 254,402 BCCWJ: [ 09] : 242 7,023 1,523 724 19,966 3,797 12,426

Step1. ( ) : : - - - - - - - - - - - - - : -:

,, 1. ( 1.4 2.0 ) 2. 10 15 ( :, ) ( :, ) 3.

( ) ( ) (, etc.)

1. : [ 09] 2. ( ) - - - - - - - - - - - - -

( ) + (,, ) ( : - v.s. - ) ( ) ( ) ( ) ( ) ) - - - ( )

: Cf. [ 09] ) - = - - ( ) (MeCab, JUMAN )

(KyTea [Neubig 11]) 2 SVM x i 2 x i 1 x i x i+1 x i+2 x i+3 : t i : Char (type) 1-gram feature: -3/ (K), -2/ (H), -1/ (K), 1/ (K), 2/ (H), 3/ (S) Char (type) 2-gram feature: -3/ (KH), -2/ (HK), -1/ (KK), 1/ (KH), 2/ (HS) Char (type) 3-gram feature: -3/ (KHK), -2/ (HKK), -1/ (KKH), 1/ (KHS)!!

1. [Mori 96] ( ) 2. # ( =1362) - - - - - - # ( =1338) - - - - - - - - -

Web(Yahoo! ) http://www.phontron.com/kytea/dictionary-addition.html (2011 11 25 ) (F ) 95.54% ( ) 96.75% ( ) 97.15% 75 80% 20 25%

: BCCWJ, UniDic, : 8 : F ( ) = LCS/ = LCS/ LCS : longest common subsequence

96.0 95.8 F-measure 95.6 95.4 95.2 95.0 0 1 2 3 4 5 6 7 8 Work time [hour] ( : 99% )

Step 2. (Named Entity) :,,,,,, (MUC) date person org. BIO2 (Begin, Intermediate, Other) /B-Dat /I-Dat /I-Dat /I-Dat /B-Per /I-Per /O /B-Org /O /O /O /O (HMM, CRF) = {B, I} NE-Type {O} : 80% 90% (1 )

... : (F), (Q), (T), (D), (Sf), (St), (Ac), (Af) F Q T Ac Af F Ac Ac

!! 1. BIO2 (1 1 ) /B-F /B-Q /I-Q /O /BT /O /B-Ac /O /B-Af /I-Af /O /B-F /I-F /I-F /I-F /O /B-Ac /O /O /B-Ac /O /O 2. (KyTea -solver 6 ) Cf. CRF

( ) 3. w P(y w) B-F 0.62 0.00 0.00 0.00 I-F 0.37 0.00 0.00 0.00 B-Q 0.00 0.82 0.01 0.00 y I-Q 0.00 0.17 0.99 0.00 B-T 0.00 0.00 0.00 0.00........ O 0.01 0.01 0.00 1.00

( ) 3. w P(y w) B-F 0.62 0.00 0.00 0.00 I-F 0.37 0.00 0.00 0.00 B-Q 0.00 0.82 0.01 0.00 y I-Q 0.00 0.17 0.99 0.00 B-T 0.00 0.00 0.00 0.00........ O 0.01 0.01 0.00 1.00 4. : F-I Q-I

(242 ) (5 ) : 1/10 : 2/10 10/10

F F-measure 68 66 64 62 60 58 56 54 52 0 2 4 6 8 10 10 10 10 10 10 10 Training corpus size ex. = 11,000 83.1%, 1,038,986 90.0%)

F F-measure 68 66 64 62 60 58 56 54 52 0 2 4 6 8 10 10 10 10 10 10 10 Training corpus size ex. 5 (243 ) 250 (12,150 )

Step 3. Cf. CaboCha, KNP

(EDA) [Flannery 11] (MST) 1. σ( i, d i, w), w i w di 2. (MST) ˆ d = argmax d D n σ( i, d i, w) i=1!!

( ) w i 3 w i 2 w i 1 w i w i+1 w i+2 w i+3 w di 3 w di 2 w di 1 w di w di +1 w di +2 w di +3 F1 w i w di F2 w i w di F3 w i w di F4 w i w di 3 F5 w i w di 3

: 2 : 11,700, 145,925 : 9,023, 263,425 : 1. 2.... 3. 8

93.2 93.0 Accuracy 92.8 92.6 92.4 92.2 0 1 2 3 4 5 6 7 8 Work time [hour] (96.83%)

Step 4. [Yoshino, Mori, et al.] 1. Ac(Chef, F Q, T ) 2. - Af (Food), 1 1 2 3. Ac (Chef, F, F ) 2 3 4. - Ac (Chef, F ) 4

[Yoshino, Mori, et al.]!!

1. : 100 724 19,966 3,797 12,426 2. : (BCCWJ + etc.) + : 1/10 + 9/10 ( ) : ( + ) + :

( )

Step 1. : 95.46% (8 ) : 95.84% Step 2. : 53.42% (5 ) : 67.02% Step 3. : 92.58% (8 ) : 93.02% F-measure F-measure Accuracy 96.0 95.8 95.6 95.4 95.2 95.0 0 1 2 3 4 5 6 7 8 68 66 64 62 60 58 56 54 52 93.2 93.0 92.8 92.6 92.4 Work time [hour] 0 2 4 6 8 10 10 10 10 10 10 10 Training corpus size 92.2 0 1 2 3 4 5 6 7 8 Work time [hour]

1. ( ) :, : -,, : F : 42.01% (8 + 5 + 8 ) 28.0%! : 58.27% F (21 ) (67.02% 90%)!!

(or ) : = = =... ( ) : - : ( ) ( ) : Mix = {,,...} : : =??g

1. 2. ( : )

1. 2. 3. 4. PNAT ( 1 3 ) ( )? or?

References Flannery, D., Miyao, Y., Neubig, G., and Mori, S.: Training Dependency Parsers from Partially Annotated Corpora, in Proceedings of the Fifth International Joint Conference on Natural Language Processing (2011) Hamada, R., Ide, I., Sakai, S., and Tanaka, H.: Structural Analysis of Cooking Preparation Steps in Japanese, in Proceedings of the fifth international workshop on Information retrieval with Asian languages, No. 8 in IRAL 00, pp. 157 164 (2000) Momouchi, Y.: Control Structures for Actions in Procedural Texts and PT-Chart, in Proceedings of the Eighth International Conference on Computational Linguistics, pp. 108 114 (1980)

Mori, S. and Nagao, M.: Word Extraction from Corpora and Its Part-of-Speech Estimation Using Distributional Analysis, in Proceedings of the 16th International Conference on Computational Linguistics (1996) Neubig, G., Nakata, Y., and Mori, S.: Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (2011) Yoshino, K., Mori, S., and Kawahara, T.: Predicate Argument Structure Analysis using Partially Annotated Corpora, in Proceedings of the Sixth International Joint Conference on Natural Language Processing (2013)

,,,, Vol. J90-DII, No. 10, pp. 2817 2829 (2007),,,, (2009),, Vol. 27, No. 4 (2012),, Vol. 24, No. 5, pp. 616 622 (2009)