Corrected Version NICT 26 2008/11/15, 1 1
Word Sketch Engine (Kilgarriff & Tugwell 01; Srdanovic, et al. 08) 2 2
3 3
( ) I-Language Grammar is Grammar and Usage is Usage (Newmeyer 03) 4 4
(is-a ) ( ) ( ) 5 5
?? 6 6
/ / ( ) ( ) 7 7
???? 8 8
9 9
? ( ) 10 10
? ( ) 11 11
? Web 12 12
? (examples) (antiexamples) ( ) vs. * ( ) anti-examples programming/object-oriented design ( (anti-matter) ) 13 13
? 14 14
? ( ) ( ) 15 15
Fast Mapping ( & Smith 07, 08) (e.g., (Sutton & Barto 98), Memory-based Learning (Daelemans & van den Bosch 05); ( & 08)) 16 16
( ) (e.g.,, ) Word Sketch Engine (Kilgarriff & Tugwell 01, Srdanovic, et al. 08, 08) 17 17
Sketch Engine 18 18
(Word) Sketch Engine (e.g., ) SkE SkE JpWaC 409,384,405 (BNC 4 ) Web http://nl.ijs.si/et/talks/cojas 7/Ikaho.ppt 19 19
Word Sketch 20
Word Sketch 21
Sketch Engine 1. 2. / ( ) 3. 4. 5. 6. (!!) 22 22
Sketch Engine 1/5 Word Sketch [ pronoun ]* * pronoun ( ) 23 23
Sketch Engine 2/5 / ( ) 5% 25% ( ) => ( ChaSen ) Coord(ination) 24 24
Sketch Engine 3/5 + < + <<?* + + <? + <<?? + NLP 25 25
Sketch Engine 4, 5/5 26 26
27 27
9 / (V) (X),,,, {, },,, Th (e.g.,off, away, up) 28 28
X / V Word Sketch [noun ] [pronoun ]* ( 10 50) V X (V, X) Metaphoric = {1, 0.5, 0}: Th = {1, 0}: X < > Loc = {1, 0}: X < > < > 29 29
freq, salience Sketch Engine URL : http://clsl.hi.h.kyoto-u.ac.jp/~kkuroda/ data/object-typology-of-cleaningverbs.xls 30 30
PoS tagging (tree) parses and/or 31 31
A. / B.? C. Th? D. (e.g, off) (e.g,,, ) E. Loc/Th? 32 32
A A. / : Salience = (Mutual Information * Log Frequency) Word Sketch X : X {,,,,,...}, {,,,,...} X {, } : X {, } {,,,,, }; {, } 33 33
B Loc Th + X N M = Metaphoric 1 0.5 X / N M (e.g.,?* ) < > V1 V2 < > V M 0.793 ++ Yes 0.733 + Yes 0.483 ++ Yes {, } 0.224 ++ Yes 0.272* + Yes* 0.02 + No 0.02 ++ No? 0 Yes 0 Yes 34 34
C Loc Th 1 0.5 X N Loc = Loc 1 0.5 X / N Th = Th 1 0.5 X / N Th Loc Loc Th M 1.0 0 0.793 No 0 1.0 0.733 Yes 0.501 0.649 0.508 Yes 1.0 0 0.25* No? 0.221 0.826 0.221 Yes 0.978 0.022 0.022 Yes 0.96 0.04 0.02 Yes 0.833 0.167 0 Yes 0.783 0.217 0 Yes 35 35
D - - - - -, ++ 0 0 0* ++ + ++ 0 + 0 +, + ++ 0 + ++ ++, 0 + ++ +, 0 0 + 0 + + 0 0 + ++ + {, } 0 0 + (, ) ++ + + (e.g., ) 0 {++, +, 0, },,, off away? up Loc vs Th 36 36
Talmy (Talmy 75, 76, 85, 03) Satellite-frame L Verb-frame L VfL ( ) SfL 10 VfL 100 ( ) SfL VfL (Matsumoto 03) 37 37
V X V (e.g., ) X <th(ing to be removed)> <th loc(ation)> (1) (<loc: > ) <th: > (2) <loc: > [cf. *<loc: > <th: > ] (3) a. (<loc: > { ; }) <th: > ; b. *<loc: > (4) a. (<loc: >{ ; })<th: > ; b. #<loc: > : th Theme (e.g., (1), (3a), (4a)) 38 38
A OVERALL STRUCTURE A! S: simplification through profiling P: presupposition R1: CLEAN(A, C) R1 A A! C R2 C! S R1 B R3 S B! S C C! C is primary affectum (thing or location); A is agent Alternate R3: DETACH-FROM(B, C) P R2: REMOVE(A, B) B B! A A! R3 P R2 C C! B B! B is theme; C is source B is primary affectum (thing only); A is agent X R1 C R2 B R1 primary/foreground X = C R2 primary/foreground X = B X (B C ) primary affectum? 39 39
E E. Loc/Th? : X X Loc Th X X < > : X X Loc Th : X X Loc X < > (product)? R3: A clean C primary 40 40
< > X Th Loc F1:<A remove B from C> F2:<A clean C (of B> (is-a <A improve C>) F3:<C detach-from C> (implied <B disappear-from C>) {F1, F2, F3} F1 F2 primary Loc/Th F1 B F2 C F3 Theme Loc (= Theme Source) X C=Th B=Loc Affectum Intended Result Product 41 41
42 42
1. R (e.g., ) P = {p1, p2,...} 2. P ( I!!) G(P) = {g1, g2,...} I 3. G N = {n1, n2,...} ( ) 43 43
(e.g, ) (e.g., ) 44 44
form key parsed corpus sense key X way sensetagged role-tagged 45
( ) ( ) 46 46
References [1] Daelemans, W. and van den Bosch, A. (2005). Memory-Based Language Processing. Cambridge University Press. Kilgarriff, A. and D. Tugwell (2001). WORD SKETCH: Extraction and Display of Significant Collocations for Lexicography. Information Technology Research Institute Technical Report ITRI-01-12. Matsumoto, Y. (2003). Typologies of lexicalization patterns and event integration: Clarifications and reformulations. In S. Chiba, et al. (eds.), Empirical and Theoretical Investigations into Language: A Festschrift for Masaru Kajita, (pp. 403-418), Tokyo: Kaitakusha. Newmeyer, F. J. (2003). Grammar is Grammar and Usage is Usage. Language 79 (4): 682-707. 47 47
References [2] Srdanovic Erjavec, I and Erjavec, T. and Kilgarrif, A. (2008). A web corpus and word sketches for Japanese. J. of Natural Language Processing 15/2. Sutton, R. S. and Barto, A. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press. Talmy, L. (1975). Semantics and the syntax of motion. In J. Kimball (ed.), Syntax and Semantics 4 (pp. 181-238), Academic Press. Talmy, L. (1976). Semantic causative types. In M. Shibitani (ed.), Syntax and Semantics 6: The Grammar of Causative Constructions. Academic Press, N.Y., pp. 43-116. Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (ed.), Language Typology and Syntactic Description III: Grammatical Categories and the Lexicon (pp. 57-149), Academic Press. Talmy, L. (1991). Path to realization. BLS 17, 480-519. 48 48
References [3] Smith, L. B. (2008).. 25. (2008).. 25. -, I (2008). Sketch Engine. 24: 59-80. 49 49