一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGIN

一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Technical Report SP2019-12(2019-08) TECHNICAL REPORT OF IEICE. WordNet 657 8501 1 1 WordNet LSTM Encoder-Decoder WordNet Princeton WordNet WordNet WordNet Improvement of Generalization Performance of Non-task-oriented Dialogue System by Use of WordNet Taisei ASO, Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, and Yasuo ARIKI Kobe University 1 1 Rokkodai-cho, Nada-ku, Kobe-shi, Hyogo, 657 8501 Japan 1. IoT NTT Apple Siri Twitter WordNet WordNet WordNet 2. WordNet Princeton WordNet [1] (Synset) Synset ID Synset Synset WordNet [2] Princeton WordNet Synset (Fig. 1) Princeton WordNet Synset Synset 57,238 (Synset ) 93,834 158,058 (Synset ) - 19-1 This article is a technical report without peer review, and its polished and/or extended version may be published elsewhere. Copyright 2019 by IEICE

Synset 04965179-n オレンジのペンキまたは絵の具 ; と黄の間の範囲にある Word Hyponym Hypernym 07747607-n 柑橘類のになる黄からオレンジまでの丸い果物橙オレンジミカン 1 Fig. 1 3. 3. 1 Twitter 07747055-n 厚いと果汁の多い果実を持つ柑橘類の多くの果実のどれか WordNet Japanese WordNet 07749969-n 形の黄い果物で果は分が多くややすっぱいグレープフルーツ Twitter URL 4 40 51 50 1 MeCab [3] Fig. 2 (Distinct) 3. 2 Word2Vec Wikipedia Word2Vec [4] [6] Word2Vec Twitter Wikipedia Wikipedia Twitter 3,049,628 (381.7MB) 4. 4. 1 Fig. 3 LSTM RNN Encoder-Decoder [7] Word2Vec 4. 2 WordNet Fig. 2 w (1) (6) V (1) w WordNet Word2Vec V w s SV (2) V Fig. 4 04167661-n 00478262-n (2) s Word2Vec s s s Fig. 4 00467719-n 00464651-n ratio depth ratio 0 1 depth ratio ratio = 0 depth 1 2 3 <EOS> 3 2 1 <SOS> 1 2 3 2 Fig. 2 Twitter Distinct of each part of speech in Twitter dialogue corpus LSTM Encoder LSTM Decoder 3 LSTM Encoder-Decoder Fig. 3 LSTM Encoder-Decoder baseline model - 20-2

0.16 0.24 04167661-n 00478262-n サッカー蹴球, フットボール, 0.6 ( 織物 ) サッカー,... 0.6 mean 00021939-n アーティファクト, 物,... 03309808-n 織り, 服地, 布, 織物, クロス,... 4 0.16 サッカー 00433458-n コンタクトスポーツ 0.16 00468480-n 蹴球, フットボール,... [0.01, 0.02, 0.00, -0.23, -0.55,...] 00464651-n アウトドアスポーツ 00467719-n field game ( 本語単語なし ) 0.24 (ratio = 0.4, depth = 2 ) Fig. 4 Proposed Method (ratio = 0.4, depth = 2) mean 1 Word2Vec Table 1 Parameters of Word2Vec training Skip-gram 256 5 5 250,908 10 2 LSTM Encoder-Decoder Table 2 Parameters of LSTM Encoder-Decoder 256 3 32,302 Adam [8] 1e-4 20% 256 300 15 ( w W ord2v ec), W 2S(w) =0 V (w)= SV (s, depth) s W 2S(w), otherwise W 2S(w) (1) 3 Table 3 Parameters of proposed method ratio 0.1, 0.2, 0.3, 0.4 depth 2 SV (s, d) W V (S2W (s)), S2H(s) =0 or d=0 SV (h, d) h S2H(s), S2W (s) =0 S2H(s) = (2) (1 ratio)w V (S2W (s))+ SV (h, d 1) h S2H(s) ratio, otherwise S2H(s) ( w W ord2v ec) W V (ws) = w ws (3) ws 5. 2 Word2Vec 6.09% 5.08% Word2Vec Fig. 5 9 Word2Vec WordNet W 2S(w) := ( w ) (4) S2W (s) := ( s ) (5) S2H(s) := ( s ) (6) 5. 5. 1 Word2Vec Table 1 LSTM Encoder-Decoder Table 2 ratio depth Table 3 4 ratio Fig. 5 5 Word2Vec PCA of Word2Vec distributed representation - 21-3

6 (ratio = 0.1) Fig. 6 PCA of proposed distributed representation (ratio = 0.1) 8 (ratio = 0.3) Fig. 8 PCA of proposed distributed representation (ratio = 0.3) 7 (ratio = 0.2) Fig. 7 PCA of proposed distributed representation (ratio = 0.2) 9 (ratio = 0.4) Fig. 9 PCA of proposed distributed representation (ratio = 0.4) 5. 3 5. 3. 1 BLEU Twitter 1 Table 4 BLEU [9] BLEU 4 ratio BLEU BLEU-1 ratio = 0.2 0.00224 (1.74%) BLEU-2 ratio = 0.3 0.001503 (7.55%) ratio = 0.1 BLEU ratio BLEU ratio BLEU Twitter!? BLEU-1 BLEU-2 BLEU 4 BLEU Table 4 BLEU of each method BLEU-1 BLEU-2 0.128396 0.019919 (ratio = 0.1) 0.129724 (+1.03%) 0.020449 (+2.66%) (ratio = 0.2) 0.130636 (+1.74%) 0.021222 (+6.54%) (ratio = 0.3) 0.129314 (+0.71%) 0.021422 (+7.55%) (ratio = 0.4) 0.128854 (+0.36%) 0.020936 (+5.11%) - 22-4

5. 3. 2 Table 5 Word2Vec Word2Vec Twitter BLEU [6] Tomas Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality, In Advances in neural information processing systems, pp. 3111 3119, 2013. [7] Ilya Sutskever et al., Sequence to Sequence Learning with Neural Networks, In Advances in neural information processing systems, pp. 3104 3112, 2014. [8] Diederik Kingma and Jimmy Ba, Adam: A method for stochastic optimiza-tion, In The International Conference on Learning Representations (ICLR), 2015. [9] George Doddington, Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics, Proc. of the Second International Conference on Human Language Technology Research 2002 (HLT 02), pp. 138 145, 2002. [10] Sascha Rothe and Hinrich Schutze, AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, Proc. of ACL 2015, pp. 1793 1803. 6. BLEU WordNet AutoExtend [10] JSPS JP17K00236 JP17H01995 [1] Princeton University "About WordNet." WordNet. Princeton University. 2010, http://wordnet.princeton.edu [2] Francis Bond et al., Enhancing the Japanese WordNet, ALR7 Proc. the 7th Workshop on Asian Language Resources, pp. 1 8, Association for Computational Linguistics. pp. 1 8, 2009. [3] Taku Kudo, Mecab: Yet another part-of-speech and morphological analyzer, http://mecab.sourceforge.net/, 2005. [4] Tomas Mikolov et al., Linguistic regularities incontinuous space word representation, Proc. of NAACL-HLT 2013, pp. 746 751, 2013. [5] Tomas Mikolov et al., Efficient estimationof word representations in vector space, arxiv:1301.3781, 2013. - 23-5

5 Table 5 Generation examples (ratio = 0.1) (ratio = 0.2) (ratio = 0.3) (ratio = 0.4) (ratio = 0.1) (ratio = 0.2) (ratio = 0.3) (ratio = 0.4) (ratio = 0.1) (ratio = 0.2) (ratio = 0.3) (ratio = 0.4) (ratio = 0.1) (ratio = 0.2) (ratio = 0.3) (ratio = 0.4)!?... - 24-6