NAIST-IS-MT1051071 2012 3 16
( )
Pustejovsky 2 2,,,,,,, NAIST-IS- MT1051071, 2012 3 16. i
Automatic Acquisition of Qualia Structure of Generative Lexicon in Japanese Using Learning to Rank Takahiro Tsuneyoshi Abstract This thesis proposes a method to acquire telic and agentive roles of target nouns automatically. Telic and agentive roles are constituents of qualia structure of generative lexicon introduced by Pustejovsky. They are a type of lexical knowledge that describes the purpose and function of the target concepts, and the event related to the emergence and origin of the target concepts. They are very useful resources for semantic interpretation and information retrieval. In previous work, although telic and agentive roles are annotated by a scale similar to the Likert scale, the problem is treated as a binary classification task and the ranked values are not fully utilized. Furthermore, there is no work on acquiring qualia structures in Japanese. In order to make better use of the scale annotation, we propose a method to directly use the scale data to train a machinelearning based ranker to acquire qualia structure in Japanese. In our experiment, we evaluate the acquisition task with a rank correlation and show the effectiveness of using learning to rank techniques. Keywords: knowledge acquisition, world knowledge, generative lexicon, qualia structure, telic role, agentive role, learning to rank Master s Thesis, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-MT1051071, March 16, 2012. ii
1. 1 1.1.................................. 1 1.2.............................. 2 2. 3 2.1................................. 3 2.2................................ 4 2.3................................ 4 2.4............................. 5 2.5 A B...... 6 3. 9 4. 11 4.1................... 11 4.1.1................... 11 4.1.2 NTT..................... 11 4.1.3.................... 12 4.2.................. 12 4.3.................. 13 4.4 /............ 15 5. / 17 5.1................................ 17 5.2................................ 17 5.3................. 22 5.4 /.................... 24 5.5.................. 27 6. 28 iii
29 30 iv
1....................... 12 2.................. 14 3.................. 14 4................. 19 5................. 19 6........... 23 7........... 23 8 7 26 9 7 26 1............................ 15 2............. 20 3... 20 4............. 21 5... 21 v
1. 1.1 WordNet EDR FrameNet Web Wikipedia [1] [2] Pustejovsky [3] 4 4 Pustejovsky 1
A B [4] [3] 1.2 Web [5, 6, 7] [7] [5, 6] / 0 10 2 / 2 3 4 / 5 2
2. Pustejovsky The Generative Lexicon [3] 4 (type coersion) (co-composition) (selective binding) 4 (argument structure) (event structure) (qualia structure) 4 (lexical inheritance structure) 2.1 4 (argument) (true argument) (default argument) (shadow argument) 3
2.2 (state) (process) (transition) < < 2.3 4 (constitutive role) (formal role) (telic role) (agentive role) 4
= 1 = x : 2 = y : 1 = v : 2 = w : = 1 = e1 : process 2 = e2 : transition = = = (x y) (y, x) = (e1, v, x) = (e2, w, x) x y x y v x e1 w x e2 2.4 5
2.5 A B A B A B A B [4] A B B x 1. x A B x A 2. x A B = [ ] 1 = x : = [ ] = x 6
= 1 = x : 2 = y : 1 = v : 2 = w : = 1 = e1 : process 2 = e2 : transition = = (x y) (y, x) = (e1, v, x) = (e2, w, x) v v = 1 = x : 2 = y : 1 = v : 2 = w : = 1 = e1 : process 2 = e2 : transition = = (x y) (y, x) = (e1, v, x) = (e2, w, x) w w 7
= = = 1 = x : 2 = y : 1 = v : 2 = w : 1 = e1 : process 2 = e2 : transition = (x y) (y, x) = (e1, v, x) = (e2, w, x) A B 4 WordNet 1 1 http://nlpwww.nict.go.jp/wn-ja/ 8
3. 1. Web (Wenderoth 2005, 2007)[5, 6] 2. (Yamada 2007)[7] Wenderoth Web 10 10 3 0: 1: 2: 3: 0 3 3 2.10 2.16 2.24 2.37 Yamada 30 50 0 10 7 10-0 9
- 3 0.479 0.605 Yamada 0 10 7 10 0 2 10
4. + 4.1 4.1.1 2 1 2010 3 CaboCha (Version: 0.60pre4, : NAIST-jdic-0.6.3) NCV NCN CF NCV 4.1.2 NTT NTT NTT ATL-J/E 1 3,000 12 30 6,000 14,000 2 http://hayashibe.jp/jdc/ 3 http://s-yata.jp/corpus/nwc2010/ 11
1 4.1.3 (Version: 0.902) 4 4,425 7,473 5 940 5 4.2 90,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 4 http://cl.cs.okayama-u.ac.jp/rsc/data/ 12
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, (,, ) (NCV) ( ) + Jaccard Jac = = N CV N CV N CV N + CV N CV N CV + + 50 90 50 ( ) + 4.3 + 90 50 + 0 10 1 2 3 13
1000 900 telic role 800 700 Frequency 600 500 400 300 200 100 0 0 2 4 6 8 10 Score 2 4000 3500 agentive role 3000 Frequency 2500 2000 1500 1000 500 0 0 2 4 6 8 10 Score 3 14
1 10 9 8 7 6 5 4 3 2 1 0 4.4 / + + / 15
2 5 - - ( )( ) - - SVMrank 5 1. 90 2. 50 + 3. + 0-10 4. 5. + + html) 5 SVMrank Version:1.00 (http://www.cs.cornell.edu/people/tj/svm_light/svm_rank. 16
5. / 5.1 Yamada [7] m ( m m ) Rs = 1 d 2 x/e d 2 x x=1 x=1 m = 1 6 d 2 x/m(2m 2 3nm + 2n 2 1) x=1 n m d x E(x) x 2 Rs 1 0 Rs 1 5.2 ( + ) 90 + 1 20 4 5 ( + ) 7-10 0 17
LibSVM 6 (-b 1) 4, 5 N (1 N 20) 3 0.789 0.551 0.653 0.516 2 4 3 5 2, 4 SVMrank 3, 5 LibSVM 6 LibSVM Version:3.11 (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) 18
1 proposed baseline 0.8 0.6 Rs 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 Top-N 4 1 proposed baseline 0.8 0.6 Rs 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 Top-N 5 19
2 1 (6.14) (5.26) (4.42) 2 (5.30) (4.90) (4.15) 3 (5.02) (4.40) (4.00) 4 (4.70) (3.71) (3.52) 5 (4.44) (3.22) (3.39) 6 (4.39) (3.19) (3.10) 7 (4.31) (3.11) (3.06) 8 (3.90) (3.10) (2.99) 9 (3.72) (3.09) (2.92) 10 (3.61) (3.07) (2.74) 3 1 (0.895) (0.885) (0.858) 2 (0.893) (0.858) (0.851) 3 (0.870) (0.855) (0.833) 4 (0.867) (0.851) (0.820) 5 (0.863) (0.823) (0.817) 6 (0.856) (0.804) (0.814) 7 (0.850) (0.804) (0.811) 8 (0.842) (0.801) (0.808) 9 (0.839) (0.776) (0.807) 10 (0.830) (0.773) (0.806) 20
4 1 (2.74) (2.76) (2.45) 2 (2.39) (2.56) (1.32) 3 (2.02) (2.34) (1.20) 4 (1.93) (2.12) (0.90) 5 (1.87) (2.09) (0.78) 6 (1.84) (1.71) (0.67) 7 (1.50) (1.64) (0.65) 8 (1.46) (1.63) (0.63) 9 (1.38) (1.52) (0.54) 10 (1.32) (1.49) (0.44) 5 1 (0.994) (0.991) (0.137) 2 (0.982) (0.961) (0.099) 3 (0.981) (0.916) (0.078) 4 (0.980) (0.900) (0.073) 5 (0.968) (0.564) (0.067) 6 (0.887) (0.298) (0.065) 7 (0.878) (0.265) (0.063) 8 (0.577) (0.250) (0.059) 9 (0.436) (0.242) (0.056) 10 (0.415) (0.229) (0.055) 21
5.3 2 1-20 6 7 ALL: 4.4 ALL-VERB: ALL ALL-NOUN: ALL ALL-VERB&NOUN: ALL 3 ALL 0.789 ALL-VERB 0.762 ALL-NOUN 0.726 ALL-VERB&NOUN 0.548 ALL 0.653 ALL-VERB 0.275 ALL-NOUN 0.673 ALL- VERB&NOUN 0.257 6, 7 (ALL-VERB&NOUN) (ALL-VERB, ALL-NOUN) (ALL-VERB&NOUN) (ALL-NOUN) (ALL-VERB) 22
1 0.8 ALL ALL-VERB ALL-NOUN ALL-VERB&NOUN 0.6 Rs 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 Top-N 6 1 0.8 ALL ALL-VERB ALL-NOUN ALL-VERB&NOUN 0.6 Rs 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 Top-N 7 23
5.4 / / + N SVMrank 2 N Top-N method SVMrank Threshold method N 7 7 8 7 9 recall = 7 7 precision = 7 Top-N method N = 20 0.758 0.549 0.637 N = 6 0.337 0.528 0.411 Threshold method 1.83 0.700 0.621 24
0.658 1.92 0.457 0.547 0.498 8, 9 Threshold method Top-N method SVMrank 25
1 Top-N method Threshold method 0.8 Precision 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Recall 8 7 1 Top-N method Threshold method 0.8 Precision 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Recall 9 7 26
5.5 27
6. 2 2 N SVMrank A B 28
29
[1],, and. Wikipedia., 16(3):3 24, 2009. [2] and.., 12(2):109 131, 2005. [3] James Pustejovsky. The Generative Lexicon. MIT Press, 1998. [4]. A B.,, 2005. [5] Philipp Cimiano and Johanna Wenderoth. Automatically learning qualia structures from the web. In Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, pages 28 37, Ann Arbor, Michigan, 2005. [6] Johanna Wenderoth. Automatic acquisition of ranked qualia structures from the web. In Proceedings of the ACL, pages 888 895, 2007. [7] Yamada Ichiro, Baldwin Timothy, Sumiyoshi Hideki, Shibata Masahiro, and Yagi Nobuyuki. Automatic acquisition of qualia structure from corpus data. IEICE Transactions on Information and Systems, 90(10):1534 1541, 2007. 30