2013 8 18
Table of Contents
= + 1. 2. 3. 4. 5. etc.
1. ( + + ( )) 2. :,,,,,, (MUC 1 ) 3. 4. (subj: person, i-obj: org. ) 1 Message Understanding Conference
( ) UGC 2 ( ) : : 2 User-Generated Content
[ 12] BCCWJ[ 09] 5 ( ) ( ) (F ) twitter 3,680 1,250 500 728 50 11 12 10 90 99.32 96.75 97.25 96.70 96.52 98.98 97.70 97.05 97.17
( ) :,,,,,, :,,,,,,,
[Momouchi 80] [Hamada 00] [ 07]
+ 1.[Neubig, Mori, et al. 11] KyTea (Cf., MeCab, JUMAN,...) 2. (F), (Q), (T), (D), (Sf), (St), (Ac), (Af)
( ) 3. [Flannery, Mori, et al.] EDA (Cf. CaboCha, KNP,...) 4. [Yoshino, Mori, et al.] Ac( : - F, : T ) 1
[ 12]
: BCCWJ 53,899 1,834,784 11,700 197,941 136,109 9,023 398,569 254,402 BCCWJ: [ 09] : 242 7,023 1,523 724 19,966 3,797 12,426
Step1. ( ) : : - - - - - - - - - - - - - : -:
,, 1. ( 1.4 2.0 ) 2. 10 15 ( :, ) ( :, ) 3.
( ) ( ) (, etc.)
1. : [ 09] 2. ( ) - - - - - - - - - - - - -
( ) + (,, ) ( : - v.s. - ) ( ) ( ) ( ) ( ) ) - - - ( )
: Cf. [ 09] ) - = - - ( ) (MeCab, JUMAN )
(KyTea [Neubig 11]) 2 SVM x i 2 x i 1 x i x i+1 x i+2 x i+3 : t i : Char (type) 1-gram feature: -3/ (K), -2/ (H), -1/ (K), 1/ (K), 2/ (H), 3/ (S) Char (type) 2-gram feature: -3/ (KH), -2/ (HK), -1/ (KK), 1/ (KH), 2/ (HS) Char (type) 3-gram feature: -3/ (KHK), -2/ (HKK), -1/ (KKH), 1/ (KHS)!!
1. [Mori 96] ( ) 2. # ( =1362) - - - - - - # ( =1338) - - - - - - - - -
Web(Yahoo! ) http://www.phontron.com/kytea/dictionary-addition.html (2011 11 25 ) (F ) 95.54% ( ) 96.75% ( ) 97.15% 75 80% 20 25%
: BCCWJ, UniDic, : 8 : F ( ) = LCS/ = LCS/ LCS : longest common subsequence
96.0 95.8 F-measure 95.6 95.4 95.2 95.0 0 1 2 3 4 5 6 7 8 Work time [hour] ( : 99% )
Step 2. (Named Entity) :,,,,,, (MUC) date person org. BIO2 (Begin, Intermediate, Other) /B-Dat /I-Dat /I-Dat /I-Dat /B-Per /I-Per /O /B-Org /O /O /O /O (HMM, CRF) = {B, I} NE-Type {O} : 80% 90% (1 )
... : (F), (Q), (T), (D), (Sf), (St), (Ac), (Af) F Q T Ac Af F Ac Ac
!! 1. BIO2 (1 1 ) /B-F /B-Q /I-Q /O /BT /O /B-Ac /O /B-Af /I-Af /O /B-F /I-F /I-F /I-F /O /B-Ac /O /O /B-Ac /O /O 2. (KyTea -solver 6 ) Cf. CRF
( ) 3. w P(y w) B-F 0.62 0.00 0.00 0.00 I-F 0.37 0.00 0.00 0.00 B-Q 0.00 0.82 0.01 0.00 y I-Q 0.00 0.17 0.99 0.00 B-T 0.00 0.00 0.00 0.00........ O 0.01 0.01 0.00 1.00
( ) 3. w P(y w) B-F 0.62 0.00 0.00 0.00 I-F 0.37 0.00 0.00 0.00 B-Q 0.00 0.82 0.01 0.00 y I-Q 0.00 0.17 0.99 0.00 B-T 0.00 0.00 0.00 0.00........ O 0.01 0.01 0.00 1.00 4. : F-I Q-I
(242 ) (5 ) : 1/10 : 2/10 10/10
F F-measure 68 66 64 62 60 58 56 54 52 0 2 4 6 8 10 10 10 10 10 10 10 Training corpus size ex. = 11,000 83.1%, 1,038,986 90.0%)
F F-measure 68 66 64 62 60 58 56 54 52 0 2 4 6 8 10 10 10 10 10 10 10 Training corpus size ex. 5 (243 ) 250 (12,150 )
Step 3. Cf. CaboCha, KNP
(EDA) [Flannery 11] (MST) 1. σ( i, d i, w), w i w di 2. (MST) ˆ d = argmax d D n σ( i, d i, w) i=1!!
( ) w i 3 w i 2 w i 1 w i w i+1 w i+2 w i+3 w di 3 w di 2 w di 1 w di w di +1 w di +2 w di +3 F1 w i w di F2 w i w di F3 w i w di F4 w i w di 3 F5 w i w di 3
: 2 : 11,700, 145,925 : 9,023, 263,425 : 1. 2.... 3. 8
93.2 93.0 Accuracy 92.8 92.6 92.4 92.2 0 1 2 3 4 5 6 7 8 Work time [hour] (96.83%)
Step 4. [Yoshino, Mori, et al.] 1. Ac(Chef, F Q, T ) 2. - Af (Food), 1 1 2 3. Ac (Chef, F, F ) 2 3 4. - Ac (Chef, F ) 4
[Yoshino, Mori, et al.]!!
1. : 100 724 19,966 3,797 12,426 2. : (BCCWJ + etc.) + : 1/10 + 9/10 ( ) : ( + ) + :
( )
Step 1. : 95.46% (8 ) : 95.84% Step 2. : 53.42% (5 ) : 67.02% Step 3. : 92.58% (8 ) : 93.02% F-measure F-measure Accuracy 96.0 95.8 95.6 95.4 95.2 95.0 0 1 2 3 4 5 6 7 8 68 66 64 62 60 58 56 54 52 93.2 93.0 92.8 92.6 92.4 Work time [hour] 0 2 4 6 8 10 10 10 10 10 10 10 Training corpus size 92.2 0 1 2 3 4 5 6 7 8 Work time [hour]
1. ( ) :, : -,, : F : 42.01% (8 + 5 + 8 ) 28.0%! : 58.27% F (21 ) (67.02% 90%)!!
(or ) : = = =... ( ) : - : ( ) ( ) : Mix = {,,...} : : =??g
1. 2. ( : )
1. 2. 3. 4. PNAT ( 1 3 ) ( )? or?
References Flannery, D., Miyao, Y., Neubig, G., and Mori, S.: Training Dependency Parsers from Partially Annotated Corpora, in Proceedings of the Fifth International Joint Conference on Natural Language Processing (2011) Hamada, R., Ide, I., Sakai, S., and Tanaka, H.: Structural Analysis of Cooking Preparation Steps in Japanese, in Proceedings of the fifth international workshop on Information retrieval with Asian languages, No. 8 in IRAL 00, pp. 157 164 (2000) Momouchi, Y.: Control Structures for Actions in Procedural Texts and PT-Chart, in Proceedings of the Eighth International Conference on Computational Linguistics, pp. 108 114 (1980)
Mori, S. and Nagao, M.: Word Extraction from Corpora and Its Part-of-Speech Estimation Using Distributional Analysis, in Proceedings of the 16th International Conference on Computational Linguistics (1996) Neubig, G., Nakata, Y., and Mori, S.: Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (2011) Yoshino, K., Mori, S., and Kawahara, T.: Predicate Argument Structure Analysis using Partially Annotated Corpora, in Proceedings of the Sixth International Joint Conference on Natural Language Processing (2013)
,,,, Vol. J90-DII, No. 10, pp. 2817 2829 (2007),,,, (2009),, Vol. 27, No. 4 (2012),, Vol. 24, No. 5, pp. 616 622 (2009)