Development of Acceptability Rating Data for Japanese (ARDJ): An Initial Report Kow KURODA (Kyorin U.), Hikaru YOKONO (Fujitsu Lab), Keiga ABE (Gifu Shotoku U.), Tomoyuki TSUCHIYA (Kyushu U), Yoshihiko ASAO (NICT), Yuichiro KOBAYASHI (Toyo U), Toshiyuki KANAMARU (Kyoto U) and Takumi TAGAWA (Tsukuba U) at The 24th Annual Meeting of NLP, 2018-03-11, Okayama Convention Center
Presentation is delivered in Japanese. Kow Kuroda, et al. 2
Acceptability Rating Data of Japanese (ARDJ) acceptability Kow Kuroda, et al. 3
ARDJ Acceptability Rating Data of Japanese (ARDJ) : 1. 2. acceptable/unacceptable = 3. = = Kow Kuroda, et al. 4
Schütze, Carson T. (1996). The Empirical Base of Linguistics: Grammaticality Judgements and Linguistics methodology. University of Chicago Press. 1: (Unbiasedness) 1: (e.g., ) 2 (confirmation bias) / 2 : (Schütze 1996) / ( ) Kow Kuroda, et al. 5
2: / (Finergrainedness) 2: ( ) ( 1) ( 2) / ( 3) (incomprehensible) ARDJ 4 Kow Kuroda, et al. ( 0) Yes/No (forced choice task) 6
Tim Harford (2016). Adapt: Why Success Always Starts with Failure. Abacus Software. Daniel Kahneman (2012). Thinking, Fast and Slow. Penguin. 3: (Large(r)-scaledness) 3: (Kahneman 2012) ( ) (Harford 2012) Kow Kuroda, et al. 7
(Data preparation) (Experiment) (Analysis/Results and Discussion) Kow Kuroda, et al. 8
( ) ( ) 1 20 I. / 1 II. III. Kow Kuroda, et al. 9
(Data preparation) 01
1/2 Kow Kuroda, et al. 11
2/2 ( ) (, 1. ( ) seeds) ( ) ( ) originals O DNA ( ) 2. O mutations M 3. M. Kow Kuroda, et al. 12
O : 1. NINJAL-LWP (for BCCWJ) 10 9 2. 4 / 3. 4 M 3 M Kow Kuroda, et al. 13
O 10 : 9 ID NINJAL-LWP 22: [go] 26: [know] 326: [be(come) silent] 338: [lose] 377: [spread, travel, get through] 44: [feel] 1147: + [know each other] 110: [search, look for] 116: [answer] 1197: + [acquire, contract, catch, develop (a disease)] Kow Kuroda, et al. 14
FCA BCCWJ ( ) 194 6 [effect is physical] [effect is mental] [effect is social] [event is interactive] [effect is interactional] [effect is intended] Kow Kuroda, et al. 15
O 4 / P1: _- _- _- _- V-( ) Gloss: Nominative + Instrument/Locative + Goal/Place + Committive/Manner + V : s111:. Gloss: Nominative + Instrument/Locative + Object/Result + Goal/Place + V : s197:. P4: _- _- _- _- V-( ) P2: _- _- _- _- V-( ) Gloss: Nominative + Instrument/Locative + Source/Material + Object/Result + V Gloss: Nominative + Instrument/Locative + Goal/Place + Object/Result + V : s151:. : s71:. P3: _- _- _- _- V-( ) P1, P2, P3, P4 Kow Kuroda, et al. 16
/ A. (lexical) p-type: (POS) ( ) B. = (positional) A : s-type: ( ) n-type: Note Python GitHub v-type: Kow Kuroda, et al. 17
200 Date Kow Kuroda, et al. 18
33 200 167 200 Kow Kuroda, et al. Type Count Ratio o 33 0.165 n 49 0.245 v 29 0.145 p 36 0.180 s 53 0.265 sum 200 1.00 19
実験 (Experiment) 01
200 gr0, gr1,, gr9 10 A, B, C and D 4 gr0-a, gr0-b, gr0-c, gr0-d, gr1-a,, gr9-a, gr9-b, gr9-c, gr9-d 40 Kow Kuroda, et al. 21
1 : 10 3. (=> 2) 2 : 20 (4 ) 4. (=> 3) 1. (=> 0) Note 2. (=> 1) ( ) Kow Kuroda, et al. 22
2 2.1 2.20 1, 2, 3, 4 2.1 1. 2. 3. 4. 2.2 1. 2. 3. 4. 2.3 1. 2. 3. 4. 2.4 1. 2. 3. 4. 2.5 1. 2. 3. 4. 2.6 1. 2. 3. 4. Kow Kuroda 2.7 01
1 10 1) [ ] 2) [ / / ] 3) [ ] 4) [ / ] 5) [ / ] 6) [ ] 7) [ ] 8) [ / / ] 9) 1 [ ] 10) [ ]. Kow Kuroda, et al. 24
1.4 1, 2, 3 1. 2. 3. 1.5 1 1, 2, 3 1. 2. 3. 1.6 ( ) 2 100 99 1.7 ( ) 2 1.8 1, 2, 3 1. 2. 3. Kow Kuroda 01
,, 3 93 109 49 216 251 35 1 20 Kow Kuroda, et al. 26
gr.id ratio sum 0 0.11 24 8 12 4 1 0.09 20 4 11 5 2 0.07 16 3 9 4 3 0.08 18 3 10 5 4 0.07 16 1 12 3 5 0.09 20 5 10 5 6 0.12 26 13 8 5 7 0.13 29 15 9 5 8 0.12 25 9 11 5 9 0.10 22 10 9 3 sum 1.00 216 71 101 44 Kow Kuroda, et al. 27
解析と結果と考察 (Analysis, Results and Discussion) 01
1/2 Encoding : [0.1, 0.3, 0.2, 0.4], [0,1], (1,2], 2,3], [0.4, 0.3, 0.2, 0.1] (3,4] k-means (k = 4, 5, 6, 7) Kow Kuroda, et al. 29
2/2 k-means (k = 4, 5, 6, 7) k = 7 gap statistics k=3 k=7 Kow Kuroda, et al. 30
(n=20) 200 20 01 31 Kow Kuroda, et al.
1/ 5 o-type ( ) n-type ( ) 01 32 Kow Kuroda, et al.
2/ 5 o-type ( ) v-type ( ) 01 33 Kow Kuroda, et al.
3/ 5 o-type ( ) p-type ( ) 01 34 Kow Kuroda, et al.
4/ 6 o-type ( ) s-type ( ) 01 35 Kow Kuroda, et al.
Kow Kuroda, et al. 36
Q1. Q2. Kow Kuroda, et al. 37
(200 ) Ward (Euclid ) 3 4, 5, 6, 7 01 Kow Kuroda, et al.
k-means k=4, 5, 6, 7 means k=6 gap statistics k=3 k=7 k=6 Kow Kuroda, et al. 39
Gap statistics k=3 ap(x = data.resp, FUNcluster = kmeans, K.ma 300, verbose = interactive()) k=7 k =8 k =1 Gap k 0.32 0.34 0.36 0.38 0.40 0.42 0.44 2 4 6 8 10 12 Kow Kuroda, et al. k 40
Kow Kuroda 01
Clustering of k=4 C1 C4 C1 C3 C3 C4 01 Kow Kuroda, et al.
Kow Kuroda 01
Clustering of k=5 C2 C5 C4 C5 01 Kow Kuroda, et al.
Kow Kuroda 01
Clustering of k=6 C5, C4, C2 C3, C6, C1 01 Kow Kuroda, et al.
Kow Kuroda 01
Clustering of k=7 k=6 01 Kow Kuroda, et al.
k=6 01
Overview of Cluster 6.1,, Cluster 6.6 Kow Kuroda, et al. 50
KL-divergence C6.5 C6.6 C6.1 C6.2 C6.3 C6.4 C6.5 C6.6 C6.1 0.000 0.070 0.062 0.156 0.106 0.101 C6.2 C6.5 C6.2 0.086 0.000 0.264 0.200 0.389 0.103 C6.4 C6.5 C6.5 C6.4 C6.6 C6.1 C6.2, C6.3 C6.1 C6.3 Kow Kuroda, et al. C6.3 0.062 0.228 0.000 0.245 0.102 0.129 C6.4 0.137 0.200 0.196 0.000 0.327 0.091 C6.5 0.100 0.281 0.133 0.336 0.000 0.368 C6.6 0.102 0.096 0.160 0.099 0.402 0.000 51
C6.1,, C6.6 ( ) C6.1: (1,2] > [0,1] > (2,3], (3,4] (mild deviance 1) C6.2: (1,2] > [0,1], (2,3], (3,4] (strong deviance 1) C6.3: [0,1], (1,2] > (2,3] > (3,4] (slight deviance) C6.4: (3,4], (1,2] > (2,3] > [0,1] (strong deviance 2) C6.5: [0,1] > (1,2] > (3,4] > (2,3] (no deviance) C6.6: (1,2], (2,3] > [0,1], (3,4] (strong deviance 3) Kow Kuroda, et al. 52
1/2 Date Kow Kuroda, et al. 53
2/2 C6.1 v-type o-, p-, v-type o-type C6.2 ( I) n-type s-type C6.5 n-type, v-type v-, p-type C6.3 s-, o-type o-type n-type C6.6 ( I, II ) C6.4 ( II) =>C6.2 C6.4 Kow Kuroda, et al. 54
Cluster 6.5 [0,1] = 3 01 Kow Kuroda, et al.
Cluster 6.5 o-type n-type s-type s-type v-type, p-type Kow Kuroda, et al. 56
Cluster 6.2 Type 1 01 Kow Kuroda, et al.
C6.2 o-type n-type v-type s-type s-type p-type Kow Kuroda, et al. 58
Cluster 6.4 01 Kow Kuroda, et al.
C6.4 o-type n-type s-type v-type o-type p-type Kow Kuroda, et al. 60
Cluster 6.1 C6.5 C6.2 01 Kow Kuroda, et al.
C6.1 o-type s-type n-type v-type : s-type p-type Kow Kuroda, et al. 62
Cluster 6.3 C6.5 C6.4 01 Kow Kuroda, et al.
C6.3 o-type v-type s-type n-type o-type, v-type, p- type Kow Kuroda, et al. 64
Cluster 6.6 C6.2 C6.4 01 Kow Kuroda, et al.
C6.6 o-type s-type v-type p-type v-type, n-type Kow Kuroda, et al. 66
C6.2 C6.4 C6.6 C6.1 C6.2 C6.5 C6.3 C6.4 C6.5 C6.5: Normal C6.2: Deviant Type 1 C6.4: Deviant Type 2 01 Kow Kuroda, et al.
01
P1,, P2 ID=0022 ( ),, ID=1197( ) Kow Kuroda, et al. 69
/ 01
4 / ( ) P1: _- _- _- _- V-( ) P3: _- _- _- _- V-( ) Gloss: Nominative + Instrument/Locative + Goal/Place + Committive/Manner + V Gloss: Nominative + Instrument/Locative + Object/Result + Goal/Place + V : s111: : s197:.. P2: _- _- _- _- V-( ) P4: _- _- _- _- V-( ) Gloss: Nominative + Instrument/Locative + Goal/Place + Object/Result + V Gloss: Nominative + Instrument/Locative + Source/Material + Object/Result + V : s151: : s71:.. Kow Kuroda, et al. 71
C6.1-C6.6 P1, P2, P3, P4 C6.1,, C6.6 C6.1 C6.2 C6.4 4 C6.3 P1 P2 C6.5 P3, P4 C6.6 P1,P2 P3, P4 01 Kow Kuroda, et al.
P1 1/2 Kow Kuroda, et al. 73
P1 2/2 Kow Kuroda, et al. 74
P2 Kow Kuroda, et al. 75
P3 1/2 Kow Kuroda, et al. 76
P3 2/2 Kow Kuroda, et al. 77
P4 1/2 Kow Kuroda, et al. 78
P4 2/2 Kow Kuroda, et al. 79
01
ID=0022 (V= [go]) Kow Kuroda, et al. 81
ID=0026 (V= [know]) Kow Kuroda, et al. 82
ID=0040 (V= [teach]) Kow Kuroda, et al. 83
ID=0044 (V= [feel]) Kow Kuroda, et al. 84
ID=0116 (V= [answer]) Kow Kuroda, et al. 85
ID=0131 (V= [search, feel out]) Kow Kuroda, et al. 86
ID=0326 (V= [be(come) silent]) Kow Kuroda, et al. 87
ID=0338 (V= [lose]) Kow Kuroda, et al. 88
ID=0377 (V= [spread, travel, get transmitted]) Kow Kuroda, et al. 89
ID=1147 (V= [know each other, acquaint]) Kow Kuroda, et al. 90
ID=1147 (V= [know each other, acquaint]) Kow Kuroda, et al. 91
ID=1197 (V= [contract, catch]) Kow Kuroda, et al. 92
01
( ) / 2 Kow Kuroda, et al. 94
( ) mixed effects model / Kow Kuroda, et al. 95
( ) Kow Kuroda, et al. 96
References Tim Harford (2016). Adapt: Why Success Always Starts with Failure. Abacus Software. [..] Daniel Kahneman (2012). Thinking, Fast and Slow. Penguin. [ & Schütze, Carson T. (1996). The Empirical Base of Linguistics: Grammaticality Judgements and Linguistics methodology. University of Chicago Press. [republished from Language Science Press in 2016 under the same time with a new preface]..] Kow Kuroda, et al. 97