B3TB2006 2017 3 31
sequence to sequence, B3TB2006, 2017 3 31. i
A Study on a Style Control for Dialogue Response Generation Reina Akama Abstract We propose a new dialogue response generation model combining sequence to sequence and metastasis learning for controling a STYLE, which impresses a character of a specific speaker. To generate consistent style responses, most of priviouse works for collected text pairs by handmade rules or human annotation. However, thier costs were too expensive to learn response generation models with a consistent style. In our methods, after learning pre-training model with a largescale copora without restriction styles, we learn a response generation model with a small-scale copora with a style restriction and the pre-training model. Compared with responses between our proposed model and baseline models without metastasis learning, our model generated consistent style and appropriate responses to input texts. Keywords: response generation, dialog, style control, newral network, transfer learning Graduation Thesis, Department of Information and Intelligent Systems, Tohoku University, B3TB2006, March 31, 2017. ii
Contents 1 1 2 3 2.1 Sequence to sequence..................... 3 2.2.................... 4 2.3................................ 5 3 6 3.1............... 6 3.1.1 Transfer........................ 6 3.2........... 7 3.2.1 Transfer+freq..................... 7 3.2.2 Transfer+sim..................... 7 4 9 4.1........................... 9 4.2........................... 9 5 11 5.1.............................. 11 5.1.1........................ 11 5.1.2..................... 11 5.1.3............... 13 5.2.................................. 13 5.2.1........................ 13 5.2.2............. 14 6 16 17 iii
List of Figures 1 seq2seq.................... 3 2.............................. 6 3 5..................... 12 iv
List of Tables 1........... 1 2....................... 9 3.................... 10 4................... 14 5.................. 14 6................. 15 7............... 15 v
1 Twitter (SNS) 1 Recurrent Neural Network (RNN) sequence to sequence (seq2seq) [18] SNS seq2seq 1 2) 4) 6) 1 [19, 20] [10] Table 1: 1) User: 2) System: 3) User: 4) System: 5) User: 6) System: 1
seq2seq SNS 2
2 2.1 Sequence to sequence 1 seq2seq [4, 18] Sutskever seq2seq 1 / / / / / / Figure 1: seq2seq seq2seq RNN RNN (x 1,..., x T ) t (1) h t (2) y t T (y 1,..., y T ) W hx W hh W yh h t = sigm(w hx x t + W hh h t 1 ) (1) y t = W yh h t (2) RNN T RNN v () RNN T ( ) Seq2seq RNN Long Short-Term Memory (LSTM) [7] Sutskever seq2seq 2 LSTM 1 LSTM (3) 3
(x 1,..., x T ) (y 1,..., y T ) p(y 1,..., y T x 1,..., x T ) p(y 1,..., y T x 1,..., x T ) = T t=1 p(y t v, y 1,..., y t 1 ) (3) v LSTM 1 y 1 LSTM v seq2seq [5, 18] [14] [15] seq2seq seq2seq 100 2.2 Walker [19, 11, 12, 13] Walker [20] 4
seq2seq [10] Twitter 2.3 1 [1] [2] [3] [9, 6] [16] GloVe[17]. 5
3 3.1?? Sutskever seq2seq [18] seq2seq RNN 3.1.1 Transfer ( ) N s Transfer seq2seq seq2seq 事前学習コーパス コーパス内の単語 学習で扱う語彙 スタイルコーパス コーパス内の単語 Figure 2: 6
3.2 Seq2seq ( unk ) 2 2 Transfer+freq Transfer+sim 2 3.2.1 Transfer+freq Transfer+freq N s N p N s 2 N p = 25, 000, N s = 1, 000 25,000 24,000 1,000 3.2.2 Transfer+sim Transfer+sim Glove [17] 2 7
8
4 Twitter TV 2 2 95% 5% Table 2: - - - 3,688,162 591,880 12,564 12,102 1,476 2,137 4.1 Twitter Twitter 2015 1 12 - - URL 370 4.2 TV 2015 9 2016 5 2 2-3 - 9
0.3% 0.04% Table 3: (a) (b)??! 10
5 5.1 5.1.1 3 Transfer 3(c) Transfer+freq 3(d) Transfer+sim 3(e) 25,000 5 Transfer Transfer+freq 1,000 500 Transfer+sim Twitter GloVe 128 0.6 10 5.1.2 seq2seq Base 3(a), Mixed 3(b) 2 Base 25,000 Base 3 Mixed Mixed 24,000 1,000 24,500 500 5 11
seq2seq seq2seq seq2seq seq2seq 事前学習コーパス コーパス内の単語 学習で扱う語彙 スタイルコーパス コーパス内の単語 事前学習コーパス + スタイルコーパス コーパス内の単語 学習で扱う単語 スタイルコーパス コーパス内の単語 (a) Base (b) Mixed seq2seq seq2seq seq2seq seq2seq 事前学習コーパス コーパス内の単語 学習で扱う語彙 スタイルコーパス コーパス内の単語 事前学習コーパス コーパス内の単語 学習で扱う語彙 スタイルコーパス コーパス内の単語 (c) Transfer (d) Transfer+freq seq2seq seq2seq 事前学習コーパス コーパス内の単語 学習で扱う語彙 スタイルコーパス コーパス内の単語 (e) Transfer+sim Figure 3: 5 12
5.1.3 64 Seq2seq 1024 LSTM 2 2048 dropout rate 0.2 Adam [8] Adam 5.2 5.2.1 1. 2. 2 1) / 2 2) / 2. Twitter 50 10 4 5 4 3 Base 80% 3 80% Transfer+freq Mixed Mixed 5 Transfer Transfer+freq 90% 13
Table 4: 1. 2. Base 39 (78%) 18 (36%) Mixed 39 (78%) 23 (46%) Transfer 41 (82%) 39 (78%) Transfer+freq 38 (76%) 39 (78%) Transfer+sim 39 (78%) 38 (76%) * 4 Table 5: 1. 2. Base 43 (86%) 0 (0%) Mixed 40 (80%) 16 (32%) Transfer 29 (58%) 45 (90%) Transfer+freq 31 (62%) 47 (94%) Transfer+sim 32 (64%) 44 (88%) * 2 3 60% Base Transfer+freq 7 2) 5.2.2 2 Transfer+freq 6 7 1 14
3 1 1 Base 3 1,400 - Table 6: 1) User: 2) System: 3) User: 4) System: 5) User: 6) System: Table 7: 1) User: 2) System: 3) User: 4) System: 5) User: 6) System: 15
6 seq2seq TV 16
Preferred Networks TV 17
References [1] Nips 2005 workshop inductive transfer: 10 years later. 2005. http://iitrl.acadiau.ca/itws05/. [2] Andrew Arnold, Ramesh Nallapati, and William W Cohen. A comparative study of methods for transductive transfer learning. In Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on, pp. 77 82. IEEE, 2007. [3] John Blitzer, Mark Dredze, Fernando Pereira, et al. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, Vol. 7, pp. 440 447, 2007. [4] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2014. http://www.aclweb.org/anthology/d14-1179. [5] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. Neural machine translation by jointly learning to align and translate. In The International Conference on Learning Representations (ICLR), 2015. [6] Almut Silja Hildebrand, Matthias Eck, Stephan Vogel, and Alex Waibel. Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of EAMT, Vol. 2005, pp. 133 142, 2005. [7] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, pp. 1735 1780, 1997. [8] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In The International Conference on Learning Representations (ICLR), 2015. 18
[9] Philipp Koehn and Josh Schroeder. Experiments in domain adaptation for statistical machine translation. In Proceedings of the second workshop on statistical machine translation, pp. 224 227. Association for Computational Linguistics, 2007. [10] Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 994 1003, 2016. [11] Grace I Lin and Marilyn A Walker. All the world s a stage: Learning character models from film. In AIIDE, 2011. [12] François Mairesse and Marilyn A Walker. Towards personality-based user adaptation: psychologically informed stylistic language generation. User Modeling and User-Adapted Interaction, pp. 227 278, 2010. [13] François Mairesse and Marilyn A Walker. Controlling user perceptions of linguistic style: Trainable generation of personality traits. Computational Linguistics, pp. 455 488, 2011. [14] Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Caglar Gulcehre, and Bing Xiang. Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 280 290. Association for Computational Linguistics, 2016. [15] Vinyals Oriol and Le Quoc. A neural conversational model. In International Conference on Machine Learning(ICML) Deep Learning Workshop 2015, 2015. [16] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, Vol. 22, No. 10, pp. 1345 1359, 2010. [17] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Meth- 19
ods in Natural Language Processing (EMNLP), pp. 1532 1543, 2014. http://www.aclweb.org/anthology/d14-1162. [18] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104 3112, 2014. [19] Marilyn A Walker, Grace I Lin, and Jennifer Sawyer. An annotated corpus of film dialogue for learning and characterizing character style. In LREC, pp. 1373 1378, 2012. [20],,,,,.., Vol. 31, No. 1, pp. DSF E 1, 2016. 20