Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate category preservation 1 / 13

analogy by vector space Figure 1: Mikolov, Yih, & Zweig (2013) 2 / 13

Sample example of word2vec 2 1.5 1 0.5 0-0.5-1 中国ロシア日本トルコポーランドドイツフランスイタリアスペインギリシャ北京モスクワアンカラ東京ワルシャワベルリンパリローマアテネ -1.5 ポルトガルマドリッドリスボン -2-2 -1.5-1 -0.5 0 0.5 1 1.5 2 Figure 2: Mikolov, Sutskever, Chen, Corrado, & Dean (2013) 3 / 13

word2vec Figure 3: :CBOW : Joulin et al. (2017) 4 / 13

For a word w with N word vector sets {c (w)} representing the words found in its contexts, and window size W, the empirical variance is: Σ w = 1 NW N i W ( c (w)ij w ) ( c (w) ij w ) j (1) This is an estimator for the covariance of a distribution assuming that the mean is fixed at w. In practice, it is also necessary to add a small ridge term δ > 0 to the diagonal of the matrix to regularize and avoid numerical problems when inverting. 5 / 13

Objective function of word2vec skip gram: J = log P (c w) (2) w D c C CBOW: J = log P (w c) (3) w D c C where, D: C: w ±h P (c w) : w C 6 / 13

Figure 4: Relations between NTT-DB and word2vec (2017) 7 / 13

Negative Sampling P ( ) softmax function: P (c w) = exp ( w ) wv c w exp ( v ) (4) wṽ w Mikolov et al. (2013) 2 log P (C w) log σ ( v wṽ c ) + κer Pn [ log σ ( v w ṽ r )], (5) 2 P n r k, σ = ( 1 + exp ( x) ) 1 Goldberg & Levy (2014); Levy & Goldberg (2014a) word2vec shifted PMI 1 1 p (x, y) p (x y ) p (y x ) pmi(x, y) log = log = log p (x) p (y) p (x) p (y) https://en.wikipedia.org/wiki/pointwise_mutual_information 8 / 13

Shifted PMI M i,j = PMI ( w i, c j ) log κ w i w j (6) PMI Levy & Goldberg (2014b) n (w, c) n (w) SGNS( Skip-gram with Negative Sampling) J = log σ ( ) [ ( )] v wṽc κer Pn log σ v w ṽ r w D c C = w D c C n (w, c) log σ ( ) v wṽc [ ( )] n (w) κe r pn log σ v w ṽ r w C (7) (8) 9 / 13

E r pn [ log σ ( v w ṽ r )] = r v c n (r) D log σ ( ) v wṽr = n (c) log σ ( ) v wṽc D + log σ ( ) v wṽr r v c \c (9) (10) 10 / 13

w c (w, c) = n (w, c) log σ ( ) v n (c) wṽc n (w, c) κ log σ ( ) v wṽc D x = v wṽc l (w, c) x 0 (11) l (w, c) x = n (w, c) σ ( x) + κn (w) n (c) σ (x) D (12) = n (w, c) {σ (x) 1} + κn (w) n (c) σx D (13) = 0 (14) 11 / 13

{ 1 + κn (w) n (c) D n (w, c) } κn (w) n (c) σ (x) = 1 exp ( x) = D n (w, c) (15) x = v wṽc (16) = D n (w, c) log κn (w) n (c) (17) = D n (w, c) log log κ n (w) n (c) (18) = PMI (w, c) log (κ) (19) 12 / 13

Goldberg, Y., & Levy, O. (2014). word2vec explained: Deriving mikolov et al. s negative-sampling word-embedding method. arxiv preprint arxiv:1402.3722. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jëgou, H., & Mikolov, T. (2017). FASTTEXT.ZIP: Compressing text classification models. In Y. Bengio & Y. LeCun (Eds.), The proceedings of International Conference on Learning Representations (ICLR). Toulon, France.. (2017). wikipedia word2vec 80.,. Levy, O., & Goldberg, Y. (2014a). Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers) (pp. 302 308). Baltimore, Maryland, USA. Levy, O., & Goldberg, Y. (2014b). Neural word embeddingas implicit matrix factorization. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27, p. 2177-2185). Montrèal CANADA: Curran Associates, Inc. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Weinberger (Eds.), Advances in neural information processing systems 26 (pp. 3111 3119). Curran Associates, Inc. Mikolov, T., Yih, W. tau, & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL. Atlanta, WA, USA. Warrington, E. K. (1975). The selective impairment of semantic memory. Quarterly Journal of Experimental Psychology, 27, 635 657. Warrington, E. K., & Shallice, T. (1979). Semantic access dyslexia. Brain, 102, 43 63. Warrington, E. K., & Shallice, T. (1984). Category specific semantic impairment. Brain, 107, 829 854. 13 / 13