580 26 5 SP-G 2011 AI An Automatic Question Generation Method for a Local Councilor Search System Yasutomo KIMURA Hideyuki SHIBUKI Keiichi TAKAMARU Hokuto Ototake Tetsuro KOBAYASHI Tatsunori MORI Otaru University of Commerce kimura@res.otaru-uc.ac.jp, http://minna.ih.otaru-uc.ac.jp Yokohama National University shib@forest.eis.ynu.ac.jp Utsunomiya Kyowa University takamaru@kyowa-u.ac.jp Fukuoka University ototake@fukuoka-u.ac.jp National Institute of Informatics k-tetsu@nii.ac.jp Yokohama National University mori@forest.eis.ynu.ac.jp keywords: local politics, question generation, information extraction Summary This paper presents an automatic question generation method for a local councilor search system. Our purpose is to provide residents with information about local council activities in an easy-to-understand manner. Our designed system creates a decision tree with leaves that correspond to local councilors in order to clarify the differences in the activities of local councilors using local council minutes as the source. Moreover, our system generates questions for selecting the next branch at each condition in the decision tree. We confirmed experimentally that these questions are appropriate for the selection of branches in the decision tree. 1. TV 1 22 A4 200
581 1 n 2 3 4 5 6 2. 2 [ 08, 09, 09a, Takamaru 09, 09, 09b, 10, 10] ( ) 1 2 2 1 SVM 2 2 2 SVM 1 2 1 2 2 1 1 [ 10]
582 26 5 SP-G 2011 [ 09] 20 3 2 1 4 [ 08] 96 1 19 1 19 7,084 1 7,084 2 2 2 59 3 [ 09] XML <Paragraph> 2 1 100% 3 20 180 Web 63 4 1 1000 1010 4,236 59.8% 1011 565 8.0% 1012 821 11.6% 1020 859 12.1% 1021 543 7.7% 1030 1,112 15.7% 1050 704 9.9% 1060 479 6.8% 1061 585 8.3% 1062 471 6.6% 1100 1,500 21.2% 1101 1,739 24.5% 1120 1,306 18.4% 1121 1,268 17.9% 1160 880 12.4% 1162 498 7.0% 2000 2013 427 6.0% 2065 396 5.6% 3000 3030 1,112 5.6% 3040 988 13.9% 3060 548 7.7% 4000 4020 646 9.1% 4110 415 5.9% 5000 5030 517 7.3% <Keyword> <Keyword> Member Category <Paragraph> Member 19 2 1 2 2 3.
583 2 <Paragraph Member= 37 > <Keyword Member= Category= 4110 > </Keyword> <Keyword Member= Category= 4110 > </Keyword> <Keyword Member= Category= 1050 > </Keyword> <Keyword Member= Category= 1050 > </Keyword> </Paragraph> <Paragraph Member= > </Paragraph> <Paragraph Member= > </Paragraph> <Paragraph Member= > <Keyword Member= Category= 1010;1101 > </Keyword> <Keyword Member= Category= 1101 > </Keyword> <Keyword Member= Category= 1030 > </Keyword> <Keyword Member= Category= 1030 > </Keyword> </Paragraph> <Paragraph Member= > </Paragraph> 2 3 2 2 ID3 C4.5 C4.5 Weka J48 4 [ 03] 3 1 Weka J48 4 http://www.cs.waikato.ac.nz/ml/weka/
584 26 5 SP-G 2011 3 A B C 1 5 10 10 25 2 10 10 20 40 3 5 10 20 35 20 30 50 100 0.2 0.3 0.5 1.0 4 3 210 8.5 13 5 9 367 9.4 14 6 10 3 2 3 1 1 1 1 n P1 P2 P1 P2 M1 P2 P1 M2 n n 1 3 3 1 3 3 19 19 4 19 1 19 2-4 2 20 49 3 1,140 18,424 20 C 3 =1, 140 49 C 3 =18, 424 3 25 3 210 367 4 9 3
585 2 (1) (2) 1 19 800 2 3 3 2 3 1 1 4 2 3 A ID=1 A C A C A B B ID=2 B ID=5 ID=5 A B
586 26 5 SP-G 2011 5 A B A B 97 63 25 24 13 13 2 4. 4 1 3 2 [ 09] 6 A B A B A B A B A B 2 [ 99] [ 09] 2 - - - A B A B - A B A B A B A B 4 2 A B 19 - A B 19 A B 14,336 5 A B 4 1 5 A B - - A B IPADIC 5 - - IPADIC - IPADIC - - - A ( / / ) B A B [ 00] A BA B A B A B GoogleN-gram [Google 07] Google N-gram 7 20 20 14,336 805 6 5 http://chasen.aist-nara.ac.jp/chasen/doc/ipadic-2.6.3-j.pdf
587 A B A B A 4 3 4 1 A B 96 19 1,667 Google N-gram A B A B CaboCha [Kudo 03] CaboCha A 7 7 A B 20 3 5 3 3 2 4 A 8 6 6 0 4 4 4 3 A A B 8 20 7 A B 7 A B A B 20 90 90 6 2 79%=(71/90) 1 A B A B ( ) A B A B 7 #
588 26 5 SP-G 2011 6 Google N-gram A B Google N-gram A B 25 43 13 31 13 70 12 51 12 20 11 95 11 71 7 1 2 3 4 5 6 1 2 1 2 3 4 5 3 ( ) 1 2 3 4 5 4 1 2 3 4 5 IDF 4 5 4 4 (1) Baseline() - A B A B Google N-gram A B Baseline (2) A B A B (3) 3 (4) IDF(Inverse Document Frequency) ICF(Inverse Category Frequency) ICF IDF Document Category Document CF t ICF t
589 8 1 3.60 2 3.94 3 # 3.41 4 3.80 5 3.57 6 4.00 7 4.17 8 4.30 9 # 3.62 10 4.17 11 # 3.38 12 # 4.41 13 2.24 14 # 3.08 15 # 4.01 16 # 3.57 17 3.60 18 4.30 19 3.74 20 3.72 74.63 # A B N ICF(w i )=1+log 2 CF(w i ) w i N 19 7,084 19 96 96 96 CF( ) ICF ICF ICF ICF = 1 n n ICF(w i ) i=1 ICF ICF( ) ICF + L (L) ICF ICF ICF + L = 1 n n ICF(w i ) logl(w i ) i=1 L(w i ) w i ICF ICF( )+L 5. 5 1 A B 4 3 20
590 26 5 SP-G 2011 19 37 2 4 3 2 = 4 3 = 5 2 20 90 4-5 9 0.85(=17/20) 0.9887 8 1 2 8 4 3 74.63 ICF 3 1 2 3 Mecab 1 IPA : 2 1 : 3 : : 9 + 10 10 : 0.95(=19/20) 5 3 19 4 3
591 9 / / Baseline() 7/20 0.35 61.84/74.63 0.8286 ( ) 15/20 0.75 66.62/74.63 0.8927 ( ) 5/20 0.25 46.74/74.63 0.6227 ( ) 16/20 0.80 70.24/74.63 0.9412 17/20 0.85 73.79/74.63 0.9887 ICF 3/20 0.15 44.54/74.63 0.5968 ICF( ) 4/20 0.20 51.91/74.63 0.6956 ICF + L 6/20 0.30 40.60/74.63 0.5440 ICF( )+L 5/20 0.25 45.10/74.63 0.6043 10 / / 17/20 0.85 73.79/74.63 0.9887 + 15/20 0.75 66.62/74.63 0.8927 + 17/20 0.85 73.79/74.63 0.9887 + + 16/20 0.80 70.24/74.63 0.9412 + : 19/20 0.95 74.51/74.63 0.9983 + :+ 19/20 0.95 71.60/74.63 0.9593 11 / / Baseline() 8/37 0.22 98.86/128.36 0.7702 ( ) 21/37 0.57 109.40/128.36 0.8523 ( ) 20/37 0.54 103.26/128.36 0.8045 ( + ) 22/37 0.60 110.83/128.36 0.8634 24/37 0.65 115.74/128.36 0.9016 + 24/37 0.65 116.00/128.36 0.9037 + 25/37 0.68 117.17/128.36 0.9128 + + 25/37 0.68 117.43/128.36 0.9148 + : 23/37 0.62 112.14/128.36 0.8736 + :+ 22/37 0.60 110.11/128.36 0.8578 37 133 11 + : + : + + + + 0.6756(=25/37) + + 0.9148 + + + + 6.
592 26 5 SP-G 2011 http://www.hokkaido-politics.net 2 twitter( ) 22300086 [ 03],,, Boosting, 4 (2003) [Google 07] Google Web N 1 by Google, GSK GSK2007-C (2007) [ 08],,,,,, 2008-NL-187, pp. 23 28 (2008) [ 00],, N1 N2,, Vol. 7, No. 4, pp. 79 98 (2000) [ 09a],, 2009 (2009) [ 09b],,,,25 1, pp. 100 118 (2009) [ 10],,,, 16, pp. 563 566 (2010) [Kudo 03] Kudo, T. and Matsumoto, Y.: Fast Methods for Kernel- Based Text Analysis, ACL 2003 (2003) [ 99], A B,, Vol. 129, No. 16, pp. 109 116 (1999) [ 09],,,,, 15, pp. 298 301 (2009) [ 10],,,,, NLC2010-1, Vol. 110, No. 142, pp. 7 12 (2010) [ 09],,,,, Vol. 109, No. 234, pp. 25 30 (2009) [Takamaru 09] Takamaru, K., Shibuki, H., Kimura, Y., Hasegawa, D., Ototake, H., and Araki, K.: Extraction of Political Activity of Assemblyman from Minutes of Municipal Assemblies Using the Political Category, Proc. 11th Conference of Pacific Association for Computational Linguistics (PACLING 2009), p. B11 (2009) [ 09], - -,, Vol. 25, No. 1, pp. 61 73 (2009) 2010 8 1
593 2004 ( ) 2005 2007 2010 10 2011 9 New York 2002 ( ) 2006 ( ) 2002 2008 2004 2006 2010 2007. 2007, Rutgers, Stanford 1991 1998 2 11 Stanford CSLI ACM