一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Technical Report SP2019-12(2019-08) TECHNICAL REPORT OF IEICE. WordNet 657 8501 1 1 WordNet LSTM Encoder-Decoder WordNet Princeton WordNet WordNet WordNet Improvement of Generalization Performance of Non-task-oriented Dialogue System by Use of WordNet Taisei ASO, Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, and Yasuo ARIKI Kobe University 1 1 Rokkodai-cho, Nada-ku, Kobe-shi, Hyogo, 657 8501 Japan 1. IoT NTT Apple Siri Twitter WordNet WordNet WordNet 2. WordNet Princeton WordNet [1] (Synset) Synset ID Synset Synset WordNet [2] Princeton WordNet Synset (Fig. 1) Princeton WordNet Synset Synset 57,238 (Synset ) 93,834 158,058 (Synset ) - 19-1 This article is a technical report without peer review, and its polished and/or extended version may be published elsewhere. Copyright 2019 by IEICE
Synset 04965179-n オレンジのペンキまたは絵の具 ; と黄 の間の範囲にある Word Hyponym Hypernym 07747607-n 柑橘類の になる黄 からオレンジまでの丸い果物 橙オレンジミカン 1 Fig. 1 3. 3. 1 Twitter 07747055-n 厚い と果汁の多い果実を持つ柑橘類の多くの果実のどれか WordNet Japanese WordNet 07749969-n 形の黄 い果物で果 は 分が多くややすっぱい グレープフルーツ Twitter URL 4 40 51 50 1 MeCab [3] Fig. 2 (Distinct) 3. 2 Word2Vec Wikipedia Word2Vec [4] [6] Word2Vec Twitter Wikipedia Wikipedia Twitter 3,049,628 (381.7MB) 4. 4. 1 Fig. 3 LSTM RNN Encoder-Decoder [7] Word2Vec 4. 2 WordNet Fig. 2 w (1) (6) V (1) w WordNet Word2Vec V w s SV (2) V Fig. 4 04167661-n 00478262-n (2) s Word2Vec s s s Fig. 4 00467719-n 00464651-n ratio depth ratio 0 1 depth ratio ratio = 0 depth 1 2 3 <EOS> 3 2 1 <SOS> 1 2 3 2 Fig. 2 Twitter Distinct of each part of speech in Twitter dialogue corpus LSTM Encoder LSTM Decoder 3 LSTM Encoder-Decoder Fig. 3 LSTM Encoder-Decoder baseline model - 20-2
0.16 0.24 04167661-n 00478262-n サッカー蹴球, フットボール, 0.6 ( 織物 ) サッカー,... 0.6 mean 00021939-n アーティファクト, 物,... 03309808-n 織り, 服地, 布, 織物, クロス,... 4 0.16 サッカー 00433458-n コンタクトスポーツ 0.16 00468480-n 蹴球, フットボール,... [0.01, 0.02, 0.00, -0.23, -0.55,...] 00464651-n アウトドアスポーツ 00467719-n field game ( 本語単語なし ) 0.24 (ratio = 0.4, depth = 2 ) Fig. 4 Proposed Method (ratio = 0.4, depth = 2) mean 1 Word2Vec Table 1 Parameters of Word2Vec training Skip-gram 256 5 5 250,908 10 2 LSTM Encoder-Decoder Table 2 Parameters of LSTM Encoder-Decoder 256 3 32,302 Adam [8] 1e-4 20% 256 300 15 ( w W ord2v ec), W 2S(w) =0 V (w)= SV (s, depth) s W 2S(w), otherwise W 2S(w) (1) 3 Table 3 Parameters of proposed method ratio 0.1, 0.2, 0.3, 0.4 depth 2 SV (s, d) W V (S2W (s)), S2H(s) =0 or d=0 SV (h, d) h S2H(s), S2W (s) =0 S2H(s) = (2) (1 ratio)w V (S2W (s))+ SV (h, d 1) h S2H(s) ratio, otherwise S2H(s) ( w W ord2v ec) W V (ws) = w ws (3) ws 5. 2 Word2Vec 6.09% 5.08% Word2Vec Fig. 5 9 Word2Vec WordNet W 2S(w) := ( w ) (4) S2W (s) := ( s ) (5) S2H(s) := ( s ) (6) 5. 5. 1 Word2Vec Table 1 LSTM Encoder-Decoder Table 2 ratio depth Table 3 4 ratio Fig. 5 5 Word2Vec PCA of Word2Vec distributed representation - 21-3
6 (ratio = 0.1) Fig. 6 PCA of proposed distributed representation (ratio = 0.1) 8 (ratio = 0.3) Fig. 8 PCA of proposed distributed representation (ratio = 0.3) 7 (ratio = 0.2) Fig. 7 PCA of proposed distributed representation (ratio = 0.2) 9 (ratio = 0.4) Fig. 9 PCA of proposed distributed representation (ratio = 0.4) 5. 3 5. 3. 1 BLEU Twitter 1 Table 4 BLEU [9] BLEU 4 ratio BLEU BLEU-1 ratio = 0.2 0.00224 (1.74%) BLEU-2 ratio = 0.3 0.001503 (7.55%) ratio = 0.1 BLEU ratio BLEU ratio BLEU Twitter!? BLEU-1 BLEU-2 BLEU 4 BLEU Table 4 BLEU of each method BLEU-1 BLEU-2 0.128396 0.019919 (ratio = 0.1) 0.129724 (+1.03%) 0.020449 (+2.66%) (ratio = 0.2) 0.130636 (+1.74%) 0.021222 (+6.54%) (ratio = 0.3) 0.129314 (+0.71%) 0.021422 (+7.55%) (ratio = 0.4) 0.128854 (+0.36%) 0.020936 (+5.11%) - 22-4
5. 3. 2 Table 5 Word2Vec Word2Vec Twitter BLEU [6] Tomas Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality, In Advances in neural information processing systems, pp. 3111 3119, 2013. [7] Ilya Sutskever et al., Sequence to Sequence Learning with Neural Networks, In Advances in neural information processing systems, pp. 3104 3112, 2014. [8] Diederik Kingma and Jimmy Ba, Adam: A method for stochastic optimiza-tion, In The International Conference on Learning Representations (ICLR), 2015. [9] George Doddington, Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics, Proc. of the Second International Conference on Human Language Technology Research 2002 (HLT 02), pp. 138 145, 2002. [10] Sascha Rothe and Hinrich Schutze, AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, Proc. of ACL 2015, pp. 1793 1803. 6. BLEU WordNet AutoExtend [10] JSPS JP17K00236 JP17H01995 [1] Princeton University "About WordNet." WordNet. Princeton University. 2010, http://wordnet.princeton.edu [2] Francis Bond et al., Enhancing the Japanese WordNet, ALR7 Proc. the 7th Workshop on Asian Language Resources, pp. 1 8, Association for Computational Linguistics. pp. 1 8, 2009. [3] Taku Kudo, Mecab: Yet another part-of-speech and morphological analyzer, http://mecab.sourceforge.net/, 2005. [4] Tomas Mikolov et al., Linguistic regularities incontinuous space word representation, Proc. of NAACL-HLT 2013, pp. 746 751, 2013. [5] Tomas Mikolov et al., Efficient estimationof word representations in vector space, arxiv:1301.3781, 2013. - 23-5
5 Table 5 Generation examples (ratio = 0.1) (ratio = 0.2) (ratio = 0.3) (ratio = 0.4) (ratio = 0.1) (ratio = 0.2) (ratio = 0.3) (ratio = 0.4) (ratio = 0.1) (ratio = 0.2) (ratio = 0.3) (ratio = 0.4) (ratio = 0.1) (ratio = 0.2) (ratio = 0.3) (ratio = 0.4)!?... - 24-6