The on-line full-text database of the Minutes of the Diet: Its potentials and limitations Kenjiro Matsuda Abstract The on-line full-text database of the Minutes of the Diet offers linguists a unique resource for corpora study of the modern Japanese language; with all the debates and information of the session searchable by keywords, and the name of the speaker, date, House, etc., all laid out in an easy-to-use interface. The database is accessible from ordinary internet browsers, and the search results are easily downloadable to the user s PC. This article explores this resource s possibilities for various linguistic research (lexicon, syntax, dialectology and discourse analysis), demonstrates actual searches and their results, and carefully examines the limitations that necessarily arise from several sources (e.g. editorial practices of the Diet Office transcribers). 1. 1890 1 1992 1999 B 12 15 : 12410129 Theoretical and Applied Linguistics at Kobe Shoin 7, 1 28, 2004. c Kobe Shoin Institute for Linguistic Sciences.
2 2001 ( (1994, 77) 50 (1998, 296) (1997, 41-42) (1999) (2001) ) 1 ( http://kokkai.ndl.go.jp/ 60 19 20 100 2. 15 10 57 5. 3 2 2 1998 144 OCR 145 3. 2 11 1 1947 5 1 4. 2
3 FAQ http://kokkai.ndl.go.jp/kensaku/www\_faq\_top. html 2 3 CD-ROM ( 50, 1998, 297) 3. 3. 1 3 http://kokkai.ndl. go.jp/ 1 4 1: 2 2 2 4 2 3 JavaScript JavaScript Windows98SE/InternetExplorer 5 WindowsXP/InternetExplorer 6 MacOSX/InternetExplorer 6 Vine Linux (Linux version2.4.18-0vl3)/mozilla/5.0 4 Linux Mozilla 4 FAQ
4 2: 5 5 22 5 20 22 5 20 1 2 209 5 1 01 2 1
5 3 3: 4 4 AND OR a A A a PTA AP
6 4 159 159 1 16 01 23 2 26 47 1,000 1,000 5 PC 3. 2 1 22 5 20 16 2 4 1 AND AND 6 7
7 4:
8 5: 26 8 2 22 8 5 1 8 8 9 4 000 6 22 8 5 10 38 9 6 1
9 6:
10 7: 8:
11 9: 8
12 7 [3 ] 4 2 4 [ ][ ][ ] [ ] [ ] 10 10: 22 8 5 1 8 22 7 56 2
13 OCR TIFF 200Kb 11 8 11: 8 [ ] PC download.txt 12 8 1947 10 15 33 2004 2 22 5. 1
14 1- - -8 22 08 05 12: download.txt AND 4 2 9 AND 10 1 9 Matsuda (1993, 31) 10 1. 2. 3. 4. 5. 5 AND
15 & 708 & 60 & 24 & 13 1: OR OR 4 (, 1995) 29 1 OR OR 4. (1997) 4. 1
16 150 4 13: 150 4 4. 2 57 2 57 1 11 3 2 (, 2003, 244 5) 1994 1 30 128 92 (, 1997, 3 4) 12 4. 3 51 (, 1997, 4) 13 11 (, 2003, 244) 12 (1995) (2003, 247) 13 (, 1994)
17 4. 4 1945 1951 1 27 14: 1951 1 27 14 5. OCR 3 14 1956 11 27 25 7 (, 1994)
18 5. 1 OCR 2. 8 145 OCR OCR 99% 100 1 15 8 8 2 3 15 (, 2001) 25,000 1 3,400
19 5. 2 2 5. 3 JIS (, 1997, 29 30) 5. 3 16 2 2 17 (1989, 44) (1989) (1989, 44) 1972 4 1. 16 (1989) (1990) (1994) (1997) 17 (1994, 74) 1 10 2 6 3
20 2. 3. 4. 4 1 (1989, 43) 15 16 15 16 1972 10 9 69 3 2 16 verbatim (1989) (, 2000) 18 18 2001
21 15: 1972 10 9 69 3
22 16: 1972 10 9 69 3
23 6. 5. Slembrouck (1992) 60 19 (2003) (2003, 53) (2003) Harada (1971) 2 2003 5 16 2000 4 19 200 16 1947 7 3 1949 4 8 1878 1947 19 http://www.aozora.gr.jp/
24 149 69.6% 65 30.4% 214 100.0% 57 34.8% 107 65.2% 164 100.0% 206 172 378 X 2 = 45.53 (p < 0.001) 2: (2003) (1986) 3 (2000) 60 (2003) (, 1991) (2003) 1990 (1990b, 1990a) 2004 2 16 20 12 21 20 http://www.asahi-net.or.jp/ gb4k-ktr/localgov.htm Explorer http://www2s.biglobe.ne.jp/ L-Fairly/chihouex.html 21
25 1990 Hansard (Slembrouck, 1992; Shaw, 2000; Harris, 2001; Pérez de Ayala, 2001; Christie, 2004) Slembrouck (1992) 2000 22 23 Slembrouck (1992) 24 60 25 (1997) 22 TV http://www.shugiintv.go.jp/top.cfm http://www.webtv.sangiin.go.jp/webtv/index.php 12 2000 23 (1997) http://www.ndl.go.jp/horei\ _jp/links/link.htm Thomas http://thomas.loc.gov/ 24 2003 10 27 25 Hansard Corpus Linguistic Data Consortium LDC 2000 1970 1980 IBM Bellcore parallel corpus
26 7. 60 OCR (1989).., No. 152, 42 47. (1997).., No. 43, 22 33. Christie, Chris (2004). Politeness and the linguistic construction of gender in Parliament: An analysis of transgressions and apology behaviour. In Sheffield Hallam Working Papers: Linguistic Politeness and Context. http://www.shu.ac.uk/wpw/politeness/ christie.htm. Harada, Shin ichi (1971). Ga-No conversion and idiolectal variations in Japanese. Annual Bulletin RILP, 5, 99 113.
27 Harris, Sandra (2001). Being politically impolite: Extending politeness theory to adversarial political discourse. Discourse & Society, 12 (4), 451 472. (2003).. 2003, pp. 95 102. (2000).., No. 6, 11 32. (2003)... (1990).., No. 161, 40 43. (2001). 21. A2 10044018 12. http://www.nii.ac.jp/publications/kaken/html\%93\%fa\%96\%7b\ %8F\%EE\%9%5\%F12000/2000Kawai-J.html. (2001). National Diet Library Newsletter No. 119. http://www.ndl.go. jp/en/publication/ndl\_newsletter/119/191.html. (1997).., No. 43, 1 13. Matsuda, Kenjiro (1993). Dissecting Analogical Leveling Quantitatively: The Case of the Innovative Potential Suffix in Tokyo Japanese. Language Variation and Change, 5, 1 34. (2000).., 51 (1), 61 76. (1986).., 17, 217 251. (2003).. 2003 XII. (1991).. 53, pp. 21 33. (1997).., No. 41, 39 46. (2003). 2..
28 Pérez de Ayala, Soledad (2001). FTAs and Erskine May: Conflicting needs? Politeness in Question Time. Journal of Pragmatics, 33, 143 169. 50 ( ) (1998). 50.,. Shaw, Silvia (2000). Language, gender and floor appointment in political debates. Discourse & Society, 11 (3), 401 418. Slembrouck, Stef (1992). The parliamentary Hansard verbatim report: The written construction of spoken discourse. Language and Literature, 1, 101 119. (1999). 11. http://www.soumu.go.jp/joho\_tsusin/ policyreport\-/japanese/papers/99wp%/99wp-0-index.html. (1997).., No. 43, 14 21. (1995).., No. 189, 79 81. (1994).., No. 30, 70 78. (1995)..,. (1990a)... (1990b)... (1997).., No. 43, 34 47. (1994)... Author s E-mail Address: kenjiro@shoin.ac.jp