corpus.indd

Similar documents
取扱説明書 -詳細版- 液晶プロジェクター CP-AW3019WNJ

HITACHI 液晶プロジェクター CP-AX3505J/CP-AW3005J 取扱説明書 -詳細版- 【技術情報編】

HITACHI 液晶プロジェクター CP-EX301NJ/CP-EW301NJ 取扱説明書 -詳細版- 【技術情報編】 日本語

Microsoft Word - 07kondo.docx

XML XML (Extensible Markup Language) ISO SGML (Standard Generalized Markup Language) W3C (World Wide Web Consortium) XML 1.0

日立液晶プロジェクター CP-AW2519NJ 取扱説明書- 詳細版-

untitled

CP-X4021NJ,WX4021NJ_.indd

補足情報

MS-1J/MS-1WJ(形名:MS-1/MS-1W)取扱説明書 - 詳細- 技術情報編

6.1号4c-03

取扱説明書<詳細版>

12~

しんきんの現況H30.PS

05‚å™J“LŁñfi~P01-06_12/27

, ,279 w

2

Catalog No.AR006-e DIN EN ISO 9001 JIS Z 9901 Certificate: 販売終了

‚å™J‚å−w“LŁñfi~P01†`08


PSCHG000.PS

取扱説明書 [F-02F]

FMV活用ガイド

<mergedsample sampleid=" サンプル ID" type="bccwj MorphXML" version="1.1" NumTrans="true"> M-XML_NT のファイルであっても 対象となる数字列が存在せず NumTrans 処理がなされていないものについてはこの属

Copyright 2008 by Tomoyoshi Yamazaki


取扱説明書<詳細版>

007 0 ue ue b 6666 D

‚å™J‚å−w“LŁñ›ÄP1-7_7/4

‚å™J‚å−w“LŁñ›Ä

取扱説明書 [F-12C]

あさひ indd

1/68 A. 電気所 ( 発電所, 変電所, 配電塔 ) における変圧器の空き容量一覧 平成 31 年 3 月 6 日現在 < 留意事項 > (1) 空容量は目安であり 系統接続の前には 接続検討のお申込みによる詳細検討が必要となります その結果 空容量が変更となる場合があります (2) 特に記載


HyRAL®FPGA設計仕様書

2004 3

 

() DTD

a (a + ), a + a > (a + ), a + 4 a < a 4 a,,, y y = + a y = + a, y = a y = ( + a) ( x) + ( a) x, x y,y a y y y ( + a : a ) ( a : a > ) y = (a + ) y = a

コーパスに基づく言語学教育研究報告 8

01…†…C…fi_0703

制御メッセージ

I II

学習の手順

05‚å™J‚å−w“LŁñ‘HP01-07_10/27


1009.\1.\4.ai

欧州特許庁米国特許商標庁との共通特許分類 CPC (Cooperative Patent Classification) 日本パテントデータサービス ( 株 ) 国際部 2019 年 1 月 17 日 CPC 版のプレ リリースが公開されました 原文及び詳細はCPCホームページの C

JAIST Reposi Title KJ 法における作法の研究 Author(s) 三村, 修 Citation Issue Date Type Thesis or Dissertation Text version author URL http

Excel ではじめる数値解析 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

...J QX

w

O E ( ) A a A A(a) O ( ) (1) O O () 467

A B 5 C mm, 89 mm 7/89 = 3.4. π 3 6 π 6 6 = 6 π > 6, π > 3 : π > 3

さくらの個別指導 ( さくら教育研究所 ) A 2 P Q 3 R S T R S T P Q ( ) ( ) m n m n m n n n

HTML文書の作成

SIRIUS_CS3*.indd

計量国語学 アーカイブ ID KK 種別 特集 招待論文 A タイトル Webコーパスの概念と種類, 利用価値 語史研究の情報源としてのWebコーパス Title The Concept, Types and Utility of Web Corpora: Web Corpora as

~/WWW-local/compIID (WWW IID ) $ mkdir WWW-local $ cd WWW-local $ mkdir compiid 3. Emacs index.html n (a) $ cd ~/WWW/compIID

XMLの利用(XMLとXSL)

A(6, 13) B(1, 1) 65 y C 2 A(2, 1) B( 3, 2) C 66 x + 2y 1 = 0 2 A(1, 1) B(3, 0) P 67 3 A(3, 3) B(1, 2) C(4, 0) (1) ABC G (2) 3 A B C P 6

1. 2 P 2 (x, y) 2 x y (0, 0) R 2 = {(x, y) x, y R} x, y R P = (x, y) O = (0, 0) OP ( ) OP x x, y y ( ) x v = y ( ) x 2 1 v = P = (x, y) y ( x y ) 2 (x

文字コードとその実装

u302.book

本文/四方田(白土三平)

PSCHG000.PS


book

Netfilter Linux Kernel IPv4 IPv6 Ethernet iptables IPv4 ip6tables IPv6 ebtables Ethernet API Kernel


Web

Microsoft Word - Sample_CQS-Report_English_backslant.doc

User's Guide

(, Goo Ishikawa, Go-o Ishikawa) ( ) 1

空き容量一覧表(154kV以上)

2/8 一次二次当該 42 AX 変圧器 なし 43 AY 変圧器 なし 44 BA 変圧器 なし 45 BB 変圧器 なし 46 BC 変圧器 なし

CP-X608J_表紙_裏表紙.indd

国試過去問集.PDF

() (1) (2) (1) 235 () 251 (1) 1,745 1,737 (2) 1,597

PROSTAGE[プロステージ]

original: 2011/11/5 revised: 2012/10/30, 2013/12/ : 2 V i V t2 V o V L V H V i V i V t1 V o V H V L V t1 V t2 1 Q 1 1 Q

5 z x c

BLOCK TYPE.indd

熊本県数学問題正解

ohp.mgp

Solutions to Quiz 1 (April 20, 2007) 1. P, Q, R (P Q) R Q (P R) P Q R (P Q) R Q (P R) X T T T T T T T T T T F T F F F T T F T F T T T T T F F F T T F

XML基礎


I

欧州特許庁米国特許商標庁との共通特許分類 CPC (Cooperative Patent Classification) 日本パテントデータサービス ( 株 ) 国際部 2019 年 7 月 31 日 CPC 版が発効します 原文及び詳細はCPCホームページのCPC Revision

1007.\1.ai

01…†…C…fi_1224

目 次 内 容 1.はじめに 免 責 事 項 お 取 り 扱 い 上 の 注 意 本 装 置 の 概 要 使 用 方 法 使 用 するための 準 備 接 続 方 法 特 殊 キー

pp DC 2,


intra-mart BaseModule/Framework

Taro-2複製

はじめての帳票作成

随筆 私本太平記


Transcription:

特定領域研究 日本語コーパス 平成 22 年度研究成果報告書 (JC-D-10-04) 現代日本語書き言葉均衡コーパス における電子化フォーマット ver.2.2 山口昌也高田智和北村雅則間淵洋子大島一小林正行西部みちる 平成 23 年 2 月 2011 文部科学省科学研究費特定領域研究 代表性を有する大規模日本語書き言葉コーパスの構築: 21 世紀の日本語研究の基盤整備 データ班

(Balanced Corpus of Contemporary Written Japanese BCCWJ ) BCCWJ 2006 2005 BCCWJ 2006 BCCWJ 2005

iii 1 1 1.1............................................. 1 1.2.................................... 1 1.3.................................... 3 1.4............................................. 8 2 9 2.1............................................. 9 2.2..................................... 9 2.3............................................. 9 2.4............................................. 11 2.5................................................ 14 2.6............................................. 16 2.7............................................ 19 2.8................................................ 21 2.9........................................... 22 2.10 BCCWJ JIS X0213:2004.......... 23 3 35 3.1................................................ 35 3.2................................................ 36 3.3 ( )......................................... 37 abstract............................................. 38 article.............................................. 43 authorsdata.......................................... 46 blockend............................................ 50 br................................................ 54 caption............................................. 56 citation............................................. 59 cluster............................................. 64 contents............................................. 68 correction............................................ 71 cursive............................................. 75 delete.............................................. 76

iv enclosedcharacter....................................... 77 figure.............................................. 79 figureblock........................................... 82 fraction............................................. 85 image.............................................. 87 info............................................... 89 list............................................... 91 listitem............................................. 94 missingcharacter....................................... 96 notebody............................................ 100 notebodyinline......................................... 102 notemarker........................................... 105 orphanedtitle......................................... 108 paragraph........................................... 110 profile.............................................. 113 quotation............................................ 117 quote.............................................. 122 rejectedblock.......................................... 125 rejectedspan.......................................... 128 ruby............................................... 130 sample............................................. 132 sampling............................................ 134 sentence............................................ 136 source.............................................. 139 speaker............................................. 141 speech............................................. 144 subscript............................................ 148 superscript........................................... 149 table.............................................. 150 title............................................... 152 titleblock............................................ 156 verse.............................................. 158 verseline............................................ 160 3.4 ( )......................................... 161 sample............................................. 162 sampling............................................ 164

1 1 1.1 BCCWJ BCCWJ BCCWJ XML XML BCCWJ 1976 () Web 1.3 XML 1.4 1.2 1.2.1 BCCWJ Web (Yahoo! *1 ) ( *1 http://chiebukuro.yahoo.co.jp/

2 1 2006) () 1 2 BCCWJ ( 2006) (1) (2) 3 4 5 6 7 1 5 Web ( ) 8 9 1.2.2

1.3 3 ([ 8]) ([ 4]) ( ) ([ 6]) ([ 3]) XML (extensible Markup Language) XML ([ 1,2]) TEI (Text Encoding Initiative) ( 2005) ( 2006) XML XML ([ 5,7]) ([ 9]) 1.3 1.3.1 XML UTF-16 JISX0213:2004 BCCWJ XML XML BCCWJ 1000 2 UTF-16 JISX0213:2004 JISX0213:2004 11000 JISX0213:2004 JISX0208 6800 3 4 4000 JISX0208 JISX0213:2004 (a) (b) (c) (a)(b) (c) BCCWJ *2 *3 *2 *3 PC OS Windows (Windows Vista) JISX0213 JISX0213

4 1 1.3.2 46 XML 1.1 ( ) 1.1 XML 3 1.3.2.1 sample sampling ( 1.1 ) sample *4 sampling ( 2007) sample sampleid sampleid sample type ( ) 1 type variablelength fixedlenghth 1.3.2.2 correction originaltext <correction type="erratum" originaltext=" "> </correction> <correction type="omission"> </correction> <correction type="excess" originaltext=" " /> ruby missingcharacter ruby JISX0213:2004 missingcharacter missingcharacter attribute Unicode unicode daikanwa description <ruby rubytext=" "> </ruby><ruby rubytext=" "> </ruby> <missingcharacter attribute="hanideograph" unicode="u+5aeb" daikanwa="m06673" description=""> </missingcharacter> *4 sample

1.3 5 1.1 () sample sampling article blockend cluster title () titleblock title title list paragraph sentence figureblock figure () caption table citation article source () speech () speaker quote article note () notebodyinline abstract article cluster authorsdata contents () profile rejectedblock verse ruby correction missingcharacter JIS X 0213:2004 (JIS ) enclosedcharacter image JIS X 0213:2004 cursive superscript subscript fraction delete br rejectedspan

6 1 <?xml version="1.0" encoding="utf-16"?> <?xml-stylesheet href="sc_check.xsl" type="text/xsl"?> <sample sampleid="ow1x_00000" version="20070208" type="variablelength"> <article articleid="ow1x_00000_v001" iswholearticle="false"> <titleblock><title><sentence type="quasi"> </sentence></title></titleblock> <paragraph> <sentence> </sentence><sentence>... </paragraph> <cluster> <titleblock><title><sentence type="quasi"> </sentence></title></titleblock> <paragraph> <sentence> </sentence>... </paragraph> <cluster> <titleblock><title><sentence type="quasi"> </sentence></title></titleblock> <paragraph> <sentence> </sentence> 1.1 ( 54 )

1.3 7 1.3.2.3 1.1 (a) (b) (c) (d) (e) article cluster paragraph sentence 1.1 1.1 article titleblock cluster article titleblock cluster titleblock cluster titleblock article article BCCWJ article 1.1 1 2 iswholearticle false cluster cluster (titleblock ) cluster cluster 2.1 cluster 2.1 cluster titleblock cluster titleblock titleblock cluster title paragraph, sentence paragraph sentence 1.3.3 TEI CES (Corpus Encoding Standard) BCCWJ TEI BCCWJ CES TEI BCCWJ CES

8 1 BCCWJ 1.4 BCCWJ (1500 ) ( 8000 ) (540 ) Web *5 Text Encoding Initiative, The XML Version of the TEI Guidelines, http://www.tei-c.org/p4x/index.html Corpus Encoding Standard, http://www.cs.vassar.edu/ces/ (2006) 18 pp.9 16 (2007) 18 (2005) ( 15) (2006) ( 124) *5 http://www2.ninjal.ac.jp/densi/public/wiki/

9 2 BCCWJ 2.1 2.2 Unicode UTF16LE Byte Order Mark LF 2.3 JIS X0213:2004 *1 (2004) JIS X0213 JIS 10,956 BCCWJ JIS X0213 (a) (b) * 2 *1 JIS 4 11,233 JIS *2

10 2 2.3.1 BCCWJ JIS X0213 (1) (2) (3) (4) 2.3.1.1 BCCWJ 1-09-09 2.10.1.3 BCCWJ JIS X0213 47 2.10.1.3 2.3.1.2 JIS 2.6.4 JIS X0213 33 2.10.1.1 2.6.5 JIS X0213 6 2.10.1.1 11 1 10 JIS X0213 11 12 2.10.1.1 2.6.3 JIS X0213 136 2.10.1.1

2.4 11 2.6.2 JIS X0213 1 3 2.10.1.1 2.3.1.3 JIS X0213 2.10.1.2 2.4.2 2.3.1.4 JIS X0213 Unicode 25 2.10.1.4 Unicode *3 2.3.2 *4 2004 JIS 10 1-47-52 Unicode U+20B9F *5 2.4 2.4.1 JIS X0213 JIS X0213:2000 6.6.3.1 (2000) 2.4.2 2.4.2.1 JIS X0213 JIS X0213 *3 XML <substitution x0213="1-04-87" unicode="304b,309a"> </substitution> *4 Microsoft Windows XP, Meadow2.0 *5 XML <substitution x0213="1-47-52" unicode="20b9f"> </substitution>

12 2 2.1 2002 6 13 2.1 2.4.2.2 JIS X0213 JIS X0213 JIS X0213 9 BCCWJ 4 4 *6 2.1 JIS 1-01-28 1-01-61 1-01-29 1-01-30 1-02-17 1-03-92 1-09-09 1-08-01 1-08-12,,,,,,,, *6 JIS X0213:2000 4 1 24

2.4 13 2.4.2.3 2.2 2.2 UCS UCS 1-09-02 00A0 1-01-01 3000 1-03-92 2013 1-01-29 1-03-91 30A0 1-01-65 2015 (HORIZONTAL BAR) FF1D (FULLWIDTH EQUALS SIGN) 1-09-08 00AB 1-01-52 300A 1-09-18 00BB 1-01-53 300B 1-01-17 203E 1-09-11 1-02-18 007E FFE3 1-09-14 00B7 1-01-06 30FB 1-03-32 2022 1-03-31 25E6 1-01-91 25CB 1-02-94 25EF 1-03-26 29BF 1-03-27 25C9 (FULLWIDTH MACRON) 1-13-64 301D 1-01-40 201C 1-13-65 301F 1-01-41 201D 2.3 3 ISO/IEC646(ASCII) 1 JIS X0213 1 BCCWJ 2.3 1-02-16 1-01-41 1-01-40 1-01-15 1-02-15 1-01-39 1-01-38 1-01-38 1-02-17 1-01-30 2.1 1-01-61 2.1

14 2 2.4.2.4 2.4 2.5 BCCWJ 2.5.1 JIS missingcharacter 2.2 2.2 <missingcharacter attribute="hanideograph" unicode="u+7752" daikanwa="m23412" description=""> </missingcharacter> 2.5.2 image 2.3 2

2.5 15 2.4 1-01-28 1-01-30 1-01-61 () () 1-01-29 1-01-40 1-01-41 1-01-77 1-01-38 1-01-39 / 1-01-50, 1-01-51 ( / ) 1-01-67, 1-01-68 ( ) Web e-mail ( ) / 1-01-52, 1-01-53 / 1-02-67, ( ) 1-02-68 1-02-14 1-01-65 ( ) 1-02-84 1-01-84 1-01-63 / 1-03-56, 1-03-88 / PHI 1-06-21, 1-06-53 1-02-39

16 2 2.3 <image no="1" description=" " /> 2.6 2.6.1 ruby rubytext 2.4 58 2.4 <ruby rubytext=""> </ruby> 2.6.6 2.6.7 2.6.2 superscript subscript 2.5 7

2.6 17 2.5 <subscript> </subscript><superscript> </superscript> 2.6.3 enclosedcharacter 2.6 55 2.6 <enclosedcharacter> </enclosedcharacter> 2.6.4 2.7 5

18 2 2.7 2.6.5 2.8 57 2.8 fraction <fraction> </fraction> 2.6.6 BCCWJ notemarker 2.9 11 2.9 ()

2.7 19 <notemarker> </notemarker> () 2.6.7 notebodyinline 2.10 17 2.10 <notebodyinline> </notebodyinline> 2.7 2.7.1

20 2 2.11 63 2.11 2.7.2 (1) (2) (3) 2.12 2.12

2.8 21 (1) (2) 2.13 56 2.13 () 2.7.3 2.14 13 2.14 2.8 correction originaltext () () 2.15

22 2 2.15 51 <correction type="erratum" originaltext=" "> </correction> 2.9 2.3 BCCWJ Unicode JIS X0213:2004 JIS X0213:2000 2.9.1 JIS X0213:2000 2004 JIS Unicode 363 2 JIS X0213:2000 Unicode 1-01-29 U+2014 EN DASHU+2015 HORIZONTAL BAR 2-94-05 U+29FCEU+29FD7 2.9.2 5 2 JIS X0213:2000 7.3 JIS X0201 5 2 BCCWJ JIS X0213:2000 4 Unicode JIS X0213 A U+0041 Unicode U+0041 LATIN CAPITAL LETTER A U+FF21 FULLWIDTH LATIN CAPITAL LETTER A AThe Unicode Consortium(2007) BCCWJ FULLWIDTH LATIN CAPITAL LETTER A U+FF21 4 Unicode 5 2 Unicode Fullwidth ASCII variants 2.10.2.2 Unicode

2.10 BCCWJ JIS X0213:2004 23 2.10 BCCWJ JIS X0213:2004 2.10.1 (277 ) 2.10.1.1 (186 ) (47 ) UCS 1-08-75 203C 1-08-76 2047 1-08-77 2048 1-08-78 2049 1-13-32 3349 1-13-33 3314 1-13-34 3322 1-13-35 334D 1-13-36 3318 1-13-37 3327 1-13-38 3303 1-13-39 3336 1-13-40 3351 1-13-41 3357 1-13-42 330D 1-13-43 3326 1-13-44 3323 1-13-45 332B 1-13-46 334A 1-13-47 333B 1-13-48 339C MM 1-13-49 339D CM 1-13-50 339E KM 1-13-51 338E MG 1-13-52 338F KG 1-13-53 33C4 CC 1-13-54 33A1 M2 1-13-63 337B 1-13-66 2116 NO 1-13-67 33CD KK 1-13-68 2121 TEL 1-13-74 3231 1-13-75 3232 1-13-76 3239 1-13-77 337E

24 2 (47 ) UCS 1-13-78 337D 1-13-79 337C 1-09-20 00BD 2 1 1-07-88 2153 3 1 1-07-89 2154 3 2 1-09-19 00BC 4 1 1-09-21 00BE 4 3 1-07-90 2155 5 1 1-12-31 217A 11 1-12-32 217B 12 1-13-31 216A 11 1-13-55 216B 12 (136 ) UCS 1-13-1 2460 1 1-13-2 2461 2 1-13-3 2462 3 1-13-4 2463 4 1-13-5 2464 5 1-13-6 2465 6 1-13-7 2466 7 1-13-8 2467 8 1-13-9 2468 9 1-13-10 2469 10 1-13-11 246A 11 1-13-12 246B 12 1-13-13 246C 13 1-13-14 246D 14 1-13-15 246E 15 1-13-16 246F 16 1-13-17 2470 17 1-13-18 2471 18 1-13-19 2472 19 1-13-20 2473 20 1-08-33 3251 21 1-08-34 3252 22 1-08-35 3253 23 1-08-36 3254 24

2.10 BCCWJ JIS X0213:2004 25 (136 ) UCS 1-08-37 3255 25 1-08-38 3256 26 1-08-39 3257 27 1-08-40 3258 28 1-08-41 3259 29 1-08-42 325A 30 1-08-43 325B 31 1-08-44 325C 32 1-08-45 325D 33 1-08-46 325E 34 1-08-47 325F 35 1-08-48 32B1 36 1-08-49 32B2 37 1-08-50 32B3 38 1-08-51 32B4 39 1-08-52 32B5 40 1-08-53 32B6 41 1-08-54 32B7 42 1-08-55 32B8 43 1-08-56 32B9 44 1-08-57 32BA 45 1-08-58 32BB 46 1-08-59 32BC 47 1-08-60 32BD 48 1-08-61 32BE 49 1-08-62 32BF 50 1-12-1 2776 1 1-12-2 2777 2 1-12-3 2778 3 1-12-4 2779 4 1-12-5 277A 5 1-12-6 277B 6 1-12-7 277C 7 1-12-8 277D 8 1-12-9 277E 9 1-12-10 277F 10 1-12-11 24EB 11 1-12-12 24EC 12 1-12-13 24ED 13 1-12-14 24EE 14

26 2 (136 ) UCS 1-12-15 24EF 15 1-12-16 24F0 16 1-12-17 24F1 17 1-12-18 24F2 18 1-12-19 24F3 19 1-12-20 24F4 20 1-06-58 24F5 1 1-06-59 24F6 2 1-06-60 24F7 3 1-06-61 24F8 4 1-06-62 24F9 5 1-06-63 24FA 6 1-06-64 24FB 7 1-06-65 24FC 8 1-06-66 24FD 9 1-06-67 24FE 10 1-12-33 24D0 A 1-12-34 24D1 B 1-12-35 24D2 C 1-12-36 24D3 D 1-12-37 24D4 E 1-12-38 24D5 F 1-12-39 24D6 G 1-12-40 24D7 H 1-12-41 24D8 I 1-12-42 24D9 J 1-12-43 24DA K 1-12-44 24DB L 1-12-45 24DC M 1-12-46 24DD N 1-12-47 24DE O 1-12-48 24DF P 1-12-49 24E0 Q 1-12-50 24E1 R 1-12-51 24E2 S 1-12-52 24E3 T 1-12-53 24E4 U 1-12-54 24E5 V 1-12-55 24E6 W 1-12-56 24E7 X

2.10 BCCWJ JIS X0213:2004 27 (136 ) UCS 1-12-57 24E8 Y 1-12-58 24E9 Z 1-12-59 32D0 1-12-60 32D1 1-12-61 32D2 1-12-62 32D3 1-12-63 32D4 1-12-64 32D5 1-12-65 32D6 1-12-66 32D7 1-12-67 32D8 1-12-68 32D9 1-12-69 32DA 1-12-70 32DB 1-12-71 32DC 1-12-72 32DD 1-12-73 32DE 1-12-74 32DF 1-12-75 32E0 1-12-76 32E1 1-12-77 32E2 1-12-78 32E3 1-12-79 32FA 1-12-80 32E9 1-12-81 32E5 1-12-82 32ED 1-12-83 32EC 1-13-69 32A4 1-13-70 32A5 1-13-71 32A6 1-13-72 32A7 1-13-73 32A8 (3 ) UCS 1-09-16 00B9 1 1-09-12 00B2 2 1-09-13 00B3 3

28 2 2.10.1.2 (17 ) UCS - -UCS - 1-09-02 00A0 1-01-01 3000 1-03-92 2013 1-01-29 2015 1-02-17 002D 1-01-30 2010 1-01-61 2212 1-03-91 30A0 1-01-65 FF1D 1-01-17 203E 1-09-11 FFE3 1-02-18 007E 1-09-14 00B7 1-01-06 30FB 1-03-32 2022 1-03-31 25E6 1-01-91 25CB 1-02-94 25EF 1-03-26 29BF 1-03-27 25C9 1-09-08 00AB 1-01-52 300A 1-09-18 00BB 1-01-53 300B 1-13-64 301D 1-01-40 201C 1-13-65 301F 1-01-41 201D 1-02-16 0022 1-01-40 201C 1-01-41 201D 1-02-15 0027 1-01-38 2018 1-01-39 2019 2.10.1.3 (48 ) (1 ) UCS 1-09-09 00AD (47 ) UCS 1-08-1 2500 1-08-2 2502 1-08-3 250C

2.10 BCCWJ JIS X0213:2004 29 (47 ) UCS 1-08-4 2510 1-08-5 2518 1-08-6 2514 1-08-7 251C 1-08-8 252C 1-08-9 2524 1-08-10 2534 1-08-11 253C 1-08-12 2501 1-08-13 2503 1-08-14 250F 1-08-15 2513 1-08-16 251B 1-08-17 2517 1-08-18 2523 1-08-19 2533 1-08-20 252B 1-08-21 253B 1-08-22 254B 1-08-23 2520 1-08-24 252F 1-08-25 2528 1-08-26 2537 1-08-27 253F 1-08-28 251D 1-08-29 2530 1-08-30 2525 1-08-31 2538 1-08-32 2542 1-07-34 23BE 1-07-35 23BF 1-07-36 23C0 1-07-37 23C1 1-07-38 23C2 1-07-39 23C3 1-07-40 23C4 1-07-41 23C5 1-07-42 23C6 1-07-43 23C7 1-07-44 23C8

30 2 (47 ) UCS 1-07-45 23C9 1-07-46 23CA 1-07-47 23CB 1-07-48 23CC 2.10.1.4 Unicode (25 ) UCS 1-4-87 <304B, 309A> 1-4-88 <304D, 309A> 1-4-89 <304F, 309A> 1-4-90 <3051, 309A> 1-4-91 <3053, 309A> 1-5-87 <30AB, 309A> 1-5-88 <30AD, 309A> 1-5-89 <30AF, 309A> 1-5-90 <30B1, 309A> 1-5-91 <30B3, 309A> 1-5-92 <30BB, 309A> 1-5-93 <30C4, 309A> 1-5-94 <30C8, 309A> 1-6-88 <31F7, 309A> 1-11-36 <00E6, 0300> AE 1-11-40 <0254, 0300> O 1-11-41 <0254, 0301> O 1-11-42 <028C, 0300> V 1-11-43 <028C, 0301> V 1-11-44 <0259, 0300> SCHWA 1-11-45 <0259, 0301> SCHWA 1-11-46 <025A, 0300> SCHWA 1-11-47 <025A, 0301> SCHWA 1-11-69 <02E9, 02E5> 1-11-70 <02E5, 02E9> 2.10.1.5 (1 ) UCS UCS 1-47-52 20B9F 1-28-24 U+53F1

2.10 BCCWJ JIS X0213:2004 31 2.10.2 (96 ) 2.10.2.1 JIS X0213:2000 (2 ) JIS UCS UCS 1-01-29 2014 2015 2-94-05 29FD7 29FCE + 2.10.2.2 Unicode (94 ) JIS UCS UCS 1-01-04 002C FF0C 1-01-05 002E FF0E 1-01-07 003A FF1A 1-01-08 003B FF1B 1-01-09 003F FF1F 1-01-14 0060 FF40 1-01-16 005E FF3E 1-09-11 00AF FFE3 * 7 1-01-18 005F FF3F 1-01-31 002F FF0F 1-01-32 005C FF3C 1-01-35 007C FF5C 1-01-42 0028 FF08 1-01-43 0029 FF09 1-01-46 005B FF3B 1-01-47 005D FF3D 1-01-48 007B FF5B 1-01-49 007D FF5D 1-01-60 002B FF0B 1-01-65 003D FF1D 1-01-67 003C FF1C 1-01-68 003E FF1E 1-01-79 00A5 FFE5 1-01-80 0024 FF04 1-01-83 0025 FF05 1-01-85 0026 FF06 1-01-86 002A FF0A 1-01-87 0040 FF20 1-02-54 2985 FF5F 1-02-55 2986 FF60 *7

32 2 JIS UCS UCS 1-03-16 0030 FF10 0 1-03-17 0031 FF11 1 1-03-18 0032 FF12 2 1-03-19 0033 FF13 3 1-03-20 0034 FF14 4 1-03-21 0035 FF15 5 1-03-22 0036 FF16 6 1-03-23 0037 FF17 7 1-03-24 0038 FF18 8 1-03-25 0039 FF19 9 1-03-33 0041 FF21 A 1-03-34 0042 FF22 B 1-03-35 0043 FF23 C 1-03-36 0044 FF24 D 1-03-37 0045 FF25 E 1-03-38 0046 FF26 F 1-03-39 0047 FF27 G 1-03-40 0048 FF28 H 1-03-41 0049 FF29 I 1-03-42 004A FF2A J 1-03-43 004B FF2B K 1-03-44 004C FF2C L 1-03-45 004D FF2D M 1-03-46 004E FF2E N 1-03-47 004F FF2F O 1-03-48 0050 FF30 P 1-03-49 0051 FF31 Q 1-03-50 0052 FF32 R 1-03-51 0053 FF33 S 1-03-52 0054 FF34 T 1-03-53 0055 FF35 U 1-03-54 0056 FF36 V 1-03-55 0057 FF37 W 1-03-56 0058 FF38 X 1-03-57 0059 FF39 Y 1-03-58 005A FF3A Z 1-03-65 0061 FF41 A 1-03-66 0062 FF42 B 1-03-67 0063 FF43 C 1-03-68 0064 FF44 D 1-03-69 0065 FF45 E

2.10 BCCWJ JIS X0213:2004 33 JIS UCS UCS 1-03-70 0066 FF46 F 1-03-71 0067 FF47 G 1-03-72 0068 FF48 H 1-03-73 0069 FF49 I 1-03-74 006A FF4A J 1-03-75 006B FF4B K 1-03-76 006C FF4C L 1-03-77 006D FF4D M 1-03-78 006E FF4E N 1-03-79 006F FF4F O 1-03-80 0070 FF50 P 1-03-81 0071 FF51 Q 1-03-82 0072 FF52 R 1-03-83 0073 FF53 S 1-03-84 0074 FF54 T 1-03-85 0075 FF55 U 1-03-86 0076 FF56 V 1-03-87 0077 FF57 W 1-03-88 0078 FF58 X 1-03-89 0079 FF59 Y 1-03-90 007A FF5A Z (2004) 7 8 2 1 JIS X 0213:2004. (2000) 7 8 2 JIS X 0213:2000. The Unicode Consortium(2007)The Unicode Standard, Version 5.0, Addison-Wesley

35 3 3.1 BCCWJ ( 45 2 ) *1 Web *2 3.1.1 1 Web Web 3.1.2 BCCWJ (article ) 1 1 1 1000 2 *1 ver 2.2 *2 http://www2.ninjal.ac.jp/densi/public/wiki/

36 3 2 sample sampling cluster 3.2 DTD (1) ENTITY() <!ENTITY % blockelement "article cluster paragraph authorsdata title titleblock figureblock abstract quotation blockend contents list profile rejectedblock notebody orphanedtitle verse info"> <!ENTITY % characterelement "missingcharacter correction image enclosedcharacter replace jis2004 jis2000 substitution"> <!ENTITY % stringelement "rejectedspan ruby fraction sampling quote subscript superscript notemarker notebodyinline"> <!ENTITY % inlineelement "%characterelement; %stringelement;"> <!ENTITY % inlinetext "#PCDATA %inlineelement;"> <!ENTITY % character "#PCDATA %characterelement;"> (2)

3.3 ( ) 37 ( )

38 3 abstract article cluster blockend, br, cluster, list, notebody, paragraph, quotation, rejectedblock, sentence DTD <!ELEMENT abstract (blockend br cluster list notebody paragraph quotation rejectedblock sentence)*> abstract article cluster abstract ( ) (article cluster ) abstract ( abstract )

3.3 タグ一覧 (可変長, abstract) ³ µ 独占禁止白書 平成 13 年版 また 上記の条件を満たしていれば article 要素だけでなく cluster 要素にも abstract 要素が含まれてい てもよいことに注意する (例2) なお 概要 Abstract など 概要となる文書要素のタイトル 代表記述に相当する文書要素がある場合 は abstract 要素の中で title 要素として記述し title 要素が包括する文書要素の範囲を cluster 要素を用い て記述する (例3) この場合 abstract 要素が複数の cluster 要素から成り立つ場合もある (例4) 形式化例 例1 新聞リード ( 毎日新聞 2003 年 3 月 2 日朝刊) ³ µ 原資料 39

40 3 <titleblock> <title> <sentence type="quasi"> </sentence> </title> </titleblock> <abstract> <paragraph> <sentence></sentence> </paragraph> </abstract> <cluster> <titleblock> <title> <sentence type="quasi"> </sentence> <sentence type="quasi"> </sentence> </title> </titleblock> br (2003 12 ) <cluster> <titleblock> <sentence> </sentence><sentence> </sentence> <title> <sentence type="quasi"> </sentence> </title> </titleblock> <abstract> <sentence> </sentence> </abstract>

3.3 タグ一覧 (可変長, abstract) 41 例3 論文概要 ( 図説 森林 林業白書 平成 14 年版) ³ µ 原資料 形式化 <abstract> <cluster> <titleblock> <title> <sentence type="quasi"> 要約 </sentence> </title> </titleblock> <paragraph> <sentence> 近年 森林に関しては 地球温暖化防止に寄与する二酸化炭素の吸収 貯蔵や 多種多様な </sentence> </cluster> <abstract> 例4 複数の要素 (概要 キーワード) からなる abstract 要素 ( 日本語科学 2004 年 10 月 16 号 ) ³ µ 原資料

42 3 <abstract> <cluster> <titleblock> <title> <sentence type="quasi"> </sentence> </title> </titleblock> <sentence type="quasi"> </sentence> </cluster> <cluster> <titleblock> <title> <sentence type="quasi"></sentence> </title> </titleblock> <paragraph> <sentence>...</sentence> </paragraph> </cluster> </abstract>

3.3 (, article) 43 article ( ) br, sentence, %blockelement; articleid () article () ID Article *3 iswholearticle () true... false... DTD <!ELEMENT article (br sentence %blockelement;)+> <!ATTLIST article articleid CDATA #REQUIRED> <!ATTLIST article iswholearticle (true false) #REQUIRED> ( ) article articleid iswholearticle articleid article ( ) ID Article * 3 iswholearticle iswholearticle true false *3 http://www2.ninjal.ac.jp/densi/public/wiki/ [ver.2.2] []

44 3 article article article article article ( article ) article (2003 11 ) () () article ( ) ( ) article ( ) ( ) article ( ) article ( ) article article ( ) article article iswholearticle false (1) article (2) 10000 cluster blockend 10000 article

3.3 (, article) 45 (3) 10000 10000 article article article <article articleid="ow5x_00201_v001"> <titleblock> <title> <sentence type="quasi"> </sentence> </title> </titleblock> <cluster> <titleblock> <title> <sentence type="quasi"></sentence> </title> </titleblock> <paragraph> <sentence> </sentence> </paragraph> </cluster> </article> br

46 3 authorsdata authorsdata (1) ( ) (2) br, info, notebody, paragraph, rejectedblock, sentence DTD <!ELEMENT authorsdata (br info notebody paragraph rejectedblock sentence)*> authorsdata ( ) authorsdata authorsdata (1) (2) ( )

3.3 (, authorsdata) 47 authorsdata authorsdata authorsdata authorsdata () title profile cluster caption () authorsdata <authorsdata> <sentence type="quasi"></sentence><br type="automatic_original"/> <sentence type="quasi"></sentence><br type="automatic_original"/> </authorsdata> <authorsdata> <sentence type="quasi"></sentence><br type="automatic_original"/> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi"></sentence><br type="automatic_original"/> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi"></sentence><br type="automatic_original"/> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi"></sentence><br type="automatic_original"/> </authorsdata>

48 3 <authorsdata> <sentence type="quasi"><quote></quote> </sentence><br type="automatic_original"/> </authorsdata> <authorsdata> <sentence type="quasi"> </sentence><br type="automatic_original"/> </authorsdata>

3.3 (, authorsdata) 49 <authorsdata> <sentence type="quasi"> </sentence><br type="automatic_original"/> </authorsdata>

50 3 blockend titleblock cluster DTD <!ELEMENT blockend EMPTY> blockend titleblock cluster ( ) blockend paragraph cluster quotation list figureblock blockend enclosedcharacter

3.3 (, blockend) 51 2003 8 2003 11 BE-PAL 2003 11 blockend paragraph blockend blockend ESSE 2003 11 titleblock cluster blockend

52 3 ESSE 2003 11 2003 8 <paragraph> </paragraph> <blockend /> <paragraph> </paragraph> sentence 2003 11 <paragraph> </paragraph> <blockend /> <paragraph> </paragraph> sentence, quote

3.3 (, blockend) 53 BE-PAL 2003 11 <paragraph> </paragraph> <blockend /> <paragraph> </paragraph> sentence, quote

54 3 br type () automatic original... DTD <!ELEMENT br EMPTY> <!ATTLIST br type (automatic_original) #REQUIRED>

3.3 (, br) 55 2003 11 <titleblock> <title> <br type="automatic_original" /> <br type="automatic_original" /> <br type="automatic_original" /> </title> </titleblock> sentence

56 3 caption figureblock br, cluster, info, list, notebody, paragraph, quotation, rejectedblock, sentence DTD <!ELEMENT caption (br cluster info list notebody paragraph quotation rejectedblock sentence)*> caption figureblock figure table caption figureblock figure table ( ) caption caption caption (1) () (2) () (2) titleblock titleblock cluster

3.3 (, caption) 57 GAMECUBE 2003 12 2003 11 GAMECUBE 2003 12 <figureblock> <figure/> <caption> <sentence> </sentence> </caption> </figureblock>

58 3 2003 11 <figureblock> <figure/> <caption> <cluster> <titleblock> <title> </title> </titleblock> </cluster> </caption> </figureblock> caption

3.3 (, citation) 59 citation article quotation authorsdata, blockend, br, cluster, figureblock, info, list, notebody, paragraph, quotation, rejectedblock, sentence, source, titleblock, verse DTD <!ELEMENT citation (authorsdata blockend br cluster figureblock info list notebody paragraph quotation rejectedblock sentence source titleblock verse)*> citation article article citation quotation quotation (1) (2) (1) citation

60 3 ( ) article citation 2003 10 citation 2003 11 ( ) article article citation article article (2) citation source source citation article citation article article citation

3.3 (, citation) 61 2003 11 2003 11 2003 10 citation a b

62 3 2003 citation citation 2003 11 <paragraph> ( ) <sentence></sentence> </paragraph> <quotation> <citation> <paragraph> <sentence> </sentence><sentence> </sentence> </paragraph> <source> <sentence type="quasi"> </sentence><br type="automatic_original"/> </source> </citation> </quotation>

3.3 (, citation) 63 2003 11 <paragraph> ( ) <sentence> </sentence> </paragraph> <quotation> <citation> <paragraph> <sentence> </sentence><sentence> </sentence><sentence> </sentence> </paragraph> </citation> </quotation> <paragraph> <sentence> 2003 <paragraph> ( ) <sentence></sentence> </paragraph> <quotation> <citation> <cluster> <titleblock> <title> <sentence type="quasi"><ruby rubytext=" "></ruby><ruby rubytext=""> </ruby> </sentence> </title> </titleblock> <paragraph> <sentence> <sentence> <sentence> </sentence> </paragraph> </cluster> </citation> </quotation> <paragraph> <sentence></sentence> br

64 3 cluster titleblock titleblock cluster abstract, article, authorsdata, blockend, br, contents, cluster, figureblock, info, list, notebody, orphanedtitle, paragraph, profile, quotation, rejectedblock, sentence, table, titleblock, verse type () : cluster DTD <!ELEMENT cluster (abstract article authorsdata blockend br contents cluster figureblock info list notebody orphanedtitle paragraph profile quotation rejectedblock sentence table titleblock verse)*> cluster title cluster cluster 2003 11

3.3 (, cluster) 65 cluster cluster () 2003 11 cluster titleblock titleblock cluster blockend blockend cluster cluster ( ) titleblock cluster cluster titleblock cluster (cluster titleblock cluster cluster ) titleblock cluster titleblock titleblock titleblock cluster list cluster cluster titleblock list

66 3 option 2003 11 cluster list 15

3.3 (, cluster) 67 2003 11 <cluster> <titleblock><title> </title></titleblock> <cluster> <titleblock><title> </title></titleblock> <cluster> <titleblock><title> </title></titleblock> </cluster> </cluster> <cluster> <titleblock><title> </title></titleblock> <cluster> <titleblock><title> </title></titleblock> ( ) </cluster> </cluster> </cluster> paragraph option 2003 11 <cluster> <titleblock> <title> </title> </titleblock> ( ) </cluster> paragraph

68 3 contents cluster, list DTD <!ELEMENT contents (cluster list)+> article article cluster contents contents (1) (2) article cluster title 2002

3.3 (, contents) 69 2003 <contents> <cluster> <titleblock> <title> <sentence type="quasi"></sentence> </title> </titleblock> <list> <listitem> <sentence type="quasi"></sentence> </listitem> <listitem> <sentence type="quasi"></sentence> </listitem> <listitem> <sentence type="quasi"></sentence> </listitem> <listitem> <sentence type="quasi"></sentence> </listitem> </list> </cluster> </contents> br

70 3 <contents> <list> <listitem> <sentence type="quasi"> </sentence> <sentence type="quasi"></sentence> </listitem> <listitem> <sentence type="quasi"></sentence> <sentence type="quasi"></sentence> <sentence type="quasi"></sentence> </listitem> <listitem> <sentence type="quasi"></sentence> <sentence type="quasi"></sentence> <sentence type="quasi"></sentence> </listitem> <listitem> <sentence type="quasi"> </sentence> <sentence type="quasi"></sentence> </listitem> </list> </contents> br

3.3 (, correction) 71 correction type excess type omission, erratum %character; type () omission... excess... erratum... originaltext () type omission originaltext DTD <!ELEMENT correction (%character;)*> <!ATTLIST correction type (omission excess erratum) #REQUIRED> <!ATTLIST correction originaltext CDATA #REQUIRED> *4 () originaltext ( ) () *4

72 3 *5 correction 3 type... omission... excess... erratum (omission) HOBBY JAPAN 2003 9 (excess) originaltext *5

3.3 (, correction) 73 BE-PAL 11 (erratum) original- Text Newton 2003 11 correction *6 2 53 *6 correction

74 3 58 2003 11 2003 11 61 9 HOBBY JAPAN 2003 9 <correction type="omission" originaltext=""> </correction> BE-PAL 2003 11 <correction type="excess" originaltext=" "/> Newton 2003 11 <correction type="erratum" originaltext=" "> </correction>

3.3 (, cursive) 75 cursive %character; DTD <!ELEMENT cursive (%character;)*> cursive \verb <cursive> </cursive><cursive> </cursive>

76 3 delete %inlinetext; type () copyright note by author... DTD <!ELEMENT delete (%inlinetext;)*> <!ATTLIST delete type (copyright_note_by_author) #IMPLIED> <delete> </delete>

3.3 (, enclosedcharacter) 77 enclosedcharacter %character;, sampling description () DTD <!ELEMENT enclosedcharacter (%character; sampling)*> <!ATTLIST enclosedcharacter description CDATA #IMPLIED> 2003

78 3 enclosedcharacter description description enclosedcharacter 2006 11 11 2003 <enclosedcharacter description=" "> </enclosedcharacter> <enclosedcharacter description=" "> </enclosedcharacter> 2006 11 11 <enclosedcharacter> </enclosedcharacter>

3.3 (, figure) 79 figure figureblock blockend, br, cluster, figureblock, list, notebody, paragraph, quotation, rejectedblock, sentence DTD <!ELEMENT figure (blockend br cluster figureblock list notebody paragraph quotation rejectedblock sentence)*> figure figureblock caption cluster figure figureblock figure table figureblock caption table figure caption figure caption figureblock caption figure figureblock figure

80 第 3 章 タグ仕様 µ µ 例1 空要素の例 日経 TRENDY 2003 年 10 月号 例2 入力対象の例 森林 林業白書 平成 14 年度 ³ ³ 例2 では 入力対象となる例を示す 表 II 1 というタイトルがあるが 内部の記述が図表に該当 しない 通常の cluster 要素で表現されるべき形式であるため figure 要素内部に cluster 構造を示す なお caption 要素を伴わない入力対象外の図表要素は figureblock 要素を構成しないため figure 要素に ならない このような付随する文書要素を一切持たない図表等入力対象外要素は rejectedblock 要素 type 属性値は figure によって示すこととする rejectedblock 要素については 当該の項を参照のこと

3.3 (, figure) 81 TRENDY 2003 10 <figureblock> <figure /> <caption> <enclosedcharacter description=" "> </enclosedcharacter> <enclosedcharacter description=" "> </enclosedcharacter> </caption> </figureblock> caption 14 <figureblock> <figure> <cluster> <notebody> </notebody> </cluster> <cluster> </cluster> <cluster> </cluster> <cluster> </cluster> </figure> <caption> II </caption> </figureblock>

82 3 figureblock caption, figure, table DTD <!ELEMENT figureblock (caption figure table)+> figureblock article (1) (2) (1) (2) figureblock figure caption

3.3 タグ一覧 (可変長, figureblock) 83 ³ ³ µ µ 例1 蛍雪時代 2003 年 11 月号 ³ µ 例2 日経 TRENDY 2003 年 10 月号 例1 蛍雪時代 において緑で囲んだ で始まる要素や 注 で始まる要素は いずれも図や表の タイトルや 表についての説明 注記 である また 同じく緑で示した 例2 日経 TRENDY の丸付き 右 上 以下の文書要素は 写真の対象物となっている商品についての説明 商品名になっている これらの要素は 図表の存在なしに意味を成さない文章であり 図表に付随していることを示さなければ 周囲の文章との繋がりや関係を捉えることができないため 他の cluster 要素や paragraph 要素と区別して マークアップする

84 3 2003 11 <figureblock> <figure /> <caption> </caption> </figureblock> <figureblock> <figure /> <caption> </caption> </figureblock> sentence caption list TRENDY 2003 10 <figureblock> <figure /> <caption> <enclosedcharacter description=" "> </enclosedcharacter> <enclosedcharacter description=" "> </enclosedcharacter> </caption> </figureblock> sentence

3.3 (, fraction) 85 fraction %inlinetext; DTD <!ELEMENT fraction (%inlinetext;)*> fraction fraction (2005 )

86 3 <fraction> </fraction>

3.3 (, image) 87 image JISX0213:2004 description () image no () DTD <!ELEMENT image EMPTY> <!ATTLIST image description CDATA #IMPLIED> <!ATTLIST image no CDATA #REQUIRED> JISX0213:2004 image image image no image no description

88 3 1990 <image description=" " no="1" /> CaprioR4... <image description=" " no="2" />

3.3 (, info) 89 info arg () value () DTD <!ELEMENT info EMPTY> <!ATTLIST info arg CDATA #REQUIRED> <!ATTLIST info value CDATA #REQUIRED> info arg value article/@iswholearticle info article iswholearticle info article 1 arg article value article - *7 *7 article

90 3 copyright/correction by author correction correction info copyright/note by author arg="article/@iswholearticle" <info arg="article/@iswholearticle" value="-"/> 2 arg="copyright/correction by author" <info arg="copyright/correction_by_author" value=" " /> 3 arg="copyright/note by author" 104 _ [] [] <info arg="copyright/note_by_author" value=" " />

3.3 (, list) 91 list listitem DTD <!ELEMENT list (listitem)+> list (1), (2), (3) list listitem list ( ) listitem dancyu 2003 8 ( dancyu 2003 8 )

92 3 list list ( 10 ) list list list table list table list table table list list

3.3 (, list) 93 dancyu 2003 8 <cluster> <titleblock> <title> <sentence type="quasi"> </sentence> </title> </titleblock> <list> <listitem><sentence type="quasi"> </sentence></listitem> <listitem><sentence type="quasi"> </sentence></listitem> <listitem><sentence type="quasi"> </sentence></listitem> <listitem><sentence type="quasi"> </sentence></listitem> <listitem><sentence type="quasi"> </sentence></listitem> <listitem><sentence type="quasi"> </sentence></listitem> <listitem><sentence type="quasi"></sentence></listitem> <listitem><sentence type="quasi"></sentence></listitem> <listitem><sentence type="quasi"> </sentence></listitem> <listitem><sentence type="quasi"> </sentence></listitem> </list> <sentence> </sentence> </cluster> dancyu 2003 8 <list> <listitem> <sentence> </sentence><sentence> </sentence><sentence></sentence> </listitem> <listitem> <sentence></sentence> </listitem> <listitem> <sentence> </sentence> </listitem> <listitem> <sentence> </sentence> </listitem> </list> list <sentence type="quasi"></sentence> <list> <listitem><sentence type="quasi"> </sentence></listitem> <listitem><sentence type="quasi"></sentence></listitem> <listitem><sentence type="quasi"></sentence></listitem> <listitem><sentence type="quasi"></sentence></listitem> </list> <sentence> </sentence>

94 3 listitem list blockend, br, cluster, figureblock, list, notebody, paragraph, quotation, rejectedblock, sentence, table DTD <!ELEMENT listitem (blockend br cluster figureblock list notebody paragraph quotation rejectedblock sentence table)*> list listitem listitem list listitem list listitem list list listitem list

3.3 (, listitem) 95 1989 1989 <cluster> <titleblock> <title> </title> </titleblock> <list> <listitem> </listitem> <listitem> <list> <listitem> </listitem> <listitem> </listitem> <listitem> </listitem> <listitem> </listitem> <listitem> </listitem> </list> </listitem> </list> </cluster> sentence

96 3 missingcharacter () JISX0213:2004 (JIS ) Unicode %character; attribute () HanIdeograph... Hiragana... Katakana... RomanNumeral... Latin... Greek... OldHanzi... unicode () Unicode4.0 16 U+***** (4 5 Unicode U+ 6 7 ) Unicode U+FFFD REPLACEMENT CHARACTER daikanwa () M***** (5 M 6 ) M99999 attribute HanIdeograph ( ) ref () Unicode KC**** (4 KC 6 ) description ()

3.3 (, missingcharacter) 97 DTD <!ELEMENT missingcharacter (%character;)*> <!ATTLIST missingcharacter attribute (Greek HanIdeograph Hiragana Katakana Latin OldHanzi RomanNumeral) #REQUIRED> <!ATTLIST missingcharacter unicode CDATA #REQUIRED> <!ATTLIST missingcharacter daikanwa CDATA #IMPLIED> <!ATTLIST missingcharacter ref CDATA #IMPLIED> <!ATTLIST missingcharacter description CDATA #IMPLIED> JISX0213:2004 JISX0213:2004 JISX0213:2004 missingcharacter JIS JISX0213:2004 image Unicode JIS 2003 <missingcharacter attribute="hanideograph" unicode="u+5aeb" daikanwa="m06673" description=""> </missingcharacter>

98 3 Unicode JIS 1986 <missingcharacter attribute="hanideograph" unicode="u+5e86" daikanwa="m99999" description=" "> </missingcharacter> Unicode JIS <missingcharacter attribute="hanideograph" unicode="u+fffd" daikanwa="m*****" description=""> </missingcharacter> Unicode JIS <missingcharacter attribute="hanideograph" unicode="u+fffd" daikanwa="m99999" ref="kc****" description=""> </missingcharacter>

3.3 (, missingcharacter) 99 2005 <missingcharacter attribute="hiragana" unicode="u+fffd" ref="kc****" description=""> </missingcharacter> 2004 <missingcharacter attribute="oldhanzi" unicode="u+fffd" ref="kc****" description=""> </missingcharacter>

100 3 notebody notemarker notebodyinline blockend, br, figureblock, info, list, paragraph, profile, quotation, rejectedblock, sentence, verse DTD <!ELEMENT notebody (blockend br figureblock info list paragraph profile quotation rejectedblock sentence verse)*> notebody *8 *9 notebodyinline notebodyinline notebody 19) 20) notebody 19) 20) notebody notebody paragraph cluster citation citation paragraph, cluster *8 *9

3.3 (, notebody) 101 2003 11 <paragraph> <notemarker text=" " />... </paragraph> <notebody>... </notebody> sentence notebody

102 3 notebodyinline * 10 notebody notemarker text () : info () : DTD <!ELEMENT notebodyinline EMPTY> <!ATTLIST notebodyinline text CDATA #REQUIRED> <!ATTLIST notebodyinline info CDATA #IMPLIED> * 11 notebodyinline notebody 5 *10 *11

3.3 (, notebodyinline) 103 text info info notemarker =, (x) (y) - 1 2 3 4 0 text ( 2003 ) <notebodyinline text="" /> <notebodyinline text="" /> <notebodyinline text="" />

104 3 ( 2003 11 ) <notebodyinline text="" />

3.3 (, notemarker) 105 notemarker *1 notebody text () : info () : DTD <!ELEMENT notemarker EMPTY> <!ATTLIST notemarker text CDATA #REQUIRED> <!ATTLIST notemarker info CDATA #IMPLIED> *1 notemarker notemarker notemarker

106 3 2003 11 text info =, (x) (y) - 1 2 3 4 0 text ( 2003 11 ) <notemarker text=" " /> sentence br

3.3 (, notemarker) 107 (2003 44 8 <notemarker text="" /> info enclosedcharacter ( <notemarker text=" " info="enclosedcharacter:description= :" />

108 3 orphanedtitle br, notebody, sentence DTD <!ELEMENT orphanedtitle (br notebody sentence)*> () orphanedtitle cluster titleblock titleblock () orphanedtitle orphanedtitle cluster titleblock ( ) orphanedtitle orphanedtitle cluster article

3.3 (, orphanedtitle) 109 2003 9 2003 9 <quotation> <speech> <speaker><sentence type="quasi"></sentence></speaker> <paragraph> <sentence> </sentence> </paragraph> </speech> </quotation> <quotation> <speech> <speaker><sentence type="quasi"> </sentence></speaker> <paragraph> <sentence> </sentence> </paragraph> </speech> </quotation> <orphanedtitle> <sentence> </sentence> </orphanedtitle> br

110 3 paragraph br, sentence DTD <!ELEMENT paragraph (br sentence)*> paragraph () () paragraph paragraph (2003 11 )

3.3 (, paragraph) 111 paragraph paragraph paragraph (2005 3 21 ) ( 2003 12 ) ( ) paragraph ( 2003 12 )

112 3 paragraph (2003 11 ) <paragraph> <sentence> </sentence> <sentence> </sentence> <sentence> </sentence> </paragraph> <paragraph> <sentence> </sentence> </paragraph> <paragraph> <sentence> </sentence> </paragraph> <paragraph> <sentence> </sentence><sentence> </sentence><sentence> </sentence><sentence> <quote> </quote></sentence> </paragraph> paragraph () <cluster> <title> </title> <paragraph> <sentence> () </sentence> </paragraph> <paragraph> <sentence> () </sentence> </paragraph> <paragraph> <sentence> () </sentence> </paragraph> ( ) cluster paragraph

3.3 (, profile) 113 profile article br, figureblock, paragraph, quotation, rejectedblock, sentence, titleblock DTD <!ELEMENT profile (br figureblock paragraph quotation rejectedblock sentence titleblock)+> article article profile profile ( THE BIG ISSUE 2006 10 15 )

114 3 profile (2003 11 ) profile profile article profile article 2003

3.3 (, profile) 115 profile 2003 8 titleblock titleblock profile cluster profile titleblock

116 3 profile THE BIG ISSUE 2006 10 15 <profile> <sentence type="quasi"></sentence> <paragraph> <sentence></sentence> <sentence></sentence> <sentence></sentence> </paragraph> <paragraph> <sentence type="quasi"> </sentence> </paragraph> <paragraph> <sentence type="quasi">:// </sentence> </paragraph> </profile> br (2003 11 ) <profile> <paragraph> <sentence> </sentence><sentence> </sentence> </paragraph> </profile> br

3.3 (, quotation) 117 quotation () quote citation, speech DTD <!ELEMENT quotation (citation speech)+> quotation article quotation citation speech citation speech quotation citation speech citation speech quotation quotation (1) (2) ( )

118 3 ( 2003 11 (2003 8 ) ( 2003 11 ) (2003 11 )

3.3 (, quotation) 119 2003 11 BACKSTAGE PASS 2003 8 ESSE 2003 11 quotation citation speech quotation citation speech speech quotation quotation (1) (2) (1) speech (2) citation speech citation

120 3 2003 11 <quotation> <speech> <paragraph> <sentence type="quasi"> <sentence type="quasi"> </sentence> </sentence> </paragraph> </speech> </quotation> <sentence type="quasi"></sentence> <quotation> <speech> <paragraph> <sentence type="quasi"> </sentence> </paragraph> </speech> </quotation> <sentence type="quasi"></sentence> br 2003 11 <sentence></sentence> <quotation> <citation> <paragraph> <sentence> </sentence> <sentence> </sentence> </paragraph> <source> <sentence type="quasi"> </sentence> </source> </citation> </quotation>

3.3 (, quotation) 121 BACKSTAGE PASS 2003 8 <quotation> <speech> <speaker><sentence type="quasi"></sentence></speaker> <paragraph> <sentence type="quasi"> <sentence> </sentence> </sentence> </paragraph> </speech> </quotation> <quotation> <speech> <speaker><sentence type="quasi"> </sentence></speaker> <paragraph> <sentence type="quasi"> <sentence> </sentence> </sentence> </paragraph> </speech> </quotation> <quotation> <speech> <speaker><sentence type="quasi"></sentence></speaker> <paragraph> <sentence type="quasi"> <sentence> </sentence><sentence> </sentence> <sentence type="quasi"></sentence> </sentence> </paragraph> </speech> </quotation> ESSE 2003 11 <sentence type="quasi"> </sentence> <quotation> <speech> <paragraph> <sentence type="quasi"> <sentence></sentence> <sentence></sentence><sentence type="quasi"> </sentence> </sentence> </paragraph> </speech> </quotation> <sentence> </sentence>

122 3 quote quotation %inlinetext;, sentence, verseline DTD <!ELEMENT quote (%inlinetext; sentence verseline)*> quote (1) ( ) (2) () quote ( ) quotaiton

3.3 (, quote) 123 2003 ESSE 2003 11 <sentence> <quote></quote> </sentence> GOLF DIGEST 2003 11 <sentence><quote> </quote> </sentence>

124 3 2003 <paragraph> %memo wiki <sentence> <quote> <sentence type="quasi"> </sentence> </quote> </sentence> </paragraph> <quotation> <speech> <paragraph> <sentence type="quasi"> </sentence> </paragraph> </speech> </quotation> <paragraph> <sentence><quote><sentence> </sentence><sentence type="quasi"> </sentence> </quote></sentence> </paragraph>

3.3 (, rejectedblock) 125 rejectedblock () rejectedspan type () : copyright... figure... formula... foreign... old... unclear... etc... DTD <!ELEMENT rejectedblock EMPTY> <!ATTLIST rejectedblock type (copyright figure formula foreign old unclear etc) #REQUIRED> rejectedblock sample (1) (2) (3)

126 3 (4) (5) rejectedblock (1) (2) ( ) figureblock rejectedblock rejecteblock type copyright... figure... formula... foreign... old... unclear... etc... )

3.3 (, rejectedblock) 127 <cluster> <titleblock> <title> <sentence type="quasi"></sentence> </title> </titleblock> <paragraph> <sentence> </sentence> </paragraph> <rejectedblock type="figure" /> </cluster>

128 3 rejectedspan block rejectedblock type () : formula... foreign... unclear... etc... DTD <!ELEMENT rejectedspan EMPTY> <!ATTLIST rejectedspan type (formula foreign unclear etc) #REQUIRED> rejectedspan sample rejectedblock (1) (2) (3)

3.3 (, rejectedspan) 129 type formula foreign unclear figure image etc rejectedspan type formula : foreign : unclear : figure : etc : 51 rejectedspan 51 <rejectedspan type="formula" />

130 3 ruby %inlinetext; rubytext () DTD <!ELEMENT ruby (%inlinetext;)*>+ <!ATTLIST ruby rubytext CDATA #REQUIRED>+ ruby rubytext ruby ruby notebodyinline

3.3 (, ruby) 131 [] () <ruby rubytext=" "> </ruby><ruby rubytext=" "> </ruby> [] () <ruby rubytext=""> </ruby> [] ( ) <ruby rubytext=""> </ruby> [ ] () <ruby rubytext=" "></ruby> []() <ruby rubytext=""> </ruby> [] ( ) <ruby rubytext=""></ruby>

132 3 sample sample article article sampleid () ID type () variablelength... ( ) version () DTD <!ELEMENT sample (article)> <!ATTLIST sample sampleid CDATA #REQUIRED> <!ATTLIST sample type (variablelength) #REQUIRED> <!ATTLIST sample version CDATA #REQUIRED> sample article article article sample sampleid version type sampleid : () ID sampleid Sample ID version : type : variablelength

3.3 (, sample) 133 <sample sampleid="ow1x_00001" version="20070208" type="variablelength"> <article articleid="ow1x_00001_v001" iswholearticle="false"> </article> </sample>

134 3 sampling type start... DTD <!ELEMENT sampling EMPTY> <!ATTLIST sampling type (start) #REQUIRED> sampling sample article 54

3.3 (, sampling) 135 54 <sampling type="start" /> sentence

136 3 sentence %inlinetext;, delete, sentence, verseline type () quasi... sentence verse... verse sentence DTD <!ELEMENT sentence (%inlinetext; delete sentence verseline)*> <!ATTLIST sentence type (quasi verse) #IMPLIED> sentence (1) sentence sentence (= sentence ) (2) type quasi

3.3 (, sentence) 137 a sentence sentence b sentence sentence <sentence></sentence><sentence type="quasi"> </sentence> sentence sentence 2.a verse sentence verse sentence type verse sentence sentence <sentence></sentence> <sentence> </sentence> <sentence> </sentence> <sentence> </sentence><sentence type="quasi"></sentence> <titleblock> <title> <sentence type="quasi"></sentence> </title> </titleblock>

138 3 sentence 2003 11 <sentence><quote> <sentence> </sentence> </quote> </sentence> sentence 2003 11 <sentence><sentence> </sentence> </sentence> <sentence><sentence> </sentence><sentence type="quasi"> </sentence></sentence> 2003 6 <verse> <sentence type="verse"><verseline /></sentence> </verse>

3.3 (, source) 139 source citation () br, info, sentence DTD <!ELEMENT source (br info sentence)*> source citation citation source source citation citation source

140 3 source 2003 11 source 2003 11 ( citation ) citation source 2003 11 <quotation> <citation> <paragraph> <sentence> </sentence><sentence> </sentence> </paragraph> <source> <sentence type="quasi"> </sentence> </source> </citation> </quotation> br

3.3 (, speaker) 141 speaker speech speech br, sentence DTD <!ELEMENT speaker (br sentence)*> speaker speech speech speaker

142 3 BACKSTAGE PASS 2003 8 2003 8 BACKSTAGE PASS 2003 8 <quotation><speech> <speaker> <sentence type="quasi"> </sentence><br type="automatic_original"/> </speaker> <paragraph> <sentence> </sentence> </paragraph> </speech></quotation> <quotation> <speech> <speaker> <sentence type="quasi"> </sentence><br type="automatic_original"/> </speaker> <paragraph> <sentence type="quasi"> </sentence> </paragraph> </speech></quotation> <quotation><speech> <speaker> <sentence type="quasi"> </sentence><br type="automatic_original"/> </speaker> <paragraph> <sentence"> </sentence><sentence type="quasi"> </sentence> </paragraph> </speech></quotation>

3.3 (, speaker) 143 2003 8 <quotation> <speech> <speaker> <sentence type="quasi"> <image description="" no="5" /> </sentence><br type="automatic_original"/> </speaker> <paragraph> <sentence></sentence> </paragraph> </speech> </quotation> <quotation> <speech> <speaker> <sentence type="quasi"> <image description="" no="6" /> </sentence><br type="automatic_original"/> </speaker> <paragraph> <sentence> </sentence> </paragraph> </speech> </quotation>

144 3 speech () quote blockend, br, list, notebody, paragraph, quotation, rejectedblock, sentence, speaker, verse DTD <!ELEMENT speech (blockend br list notebody paragraph quotation rejectedblock sentence speaker verse)*> speech quotation speech quote quote (1)

3.3 (, speech) 145 (2) article speech (1) 2003 11 (2)-1 ESSE 2003 11 (2)-2 2003 11

146 3 (1) 2003 11 <quotation> <speech> <paragraph> <sentence type="quasi"><sentence type="quasi"> </sentence> </sentence> </paragraph> </speech> </quotation> <sentence type="quasi"></sentence> <quotation> <speech> <paragraph> <sentence type="quasi"> </sentence> </paragraph> </speech> </quotation> <sentence></sentence> speech (2)-1 ESSE 2003 11 <sentence type="quasi"></sentence> <quotation> <speech> <paragraph> <sentence type="quasi"><sentence> </sentence><sentence></sentence><sentence type="quasi"> </sentence> </sentence> </paragraph> </speech> </quotation> <sentence> </sentence>

3.3 (, speech) 147 (2)-2 2003 11 <quotation> <speech> <speaker></speaker> <paragraph> <sentence></sentence><sentence> </sentence><sentence> </sentence><sentence> </sentence><sentence> </sentence> </paragraph> </speech> </quotation> <quotation> <speech> <speaker> </speaker> <paragraph> <sentence> </sentence><sentence> </sentence><sentence></sentence><sentence> </sentence><sentence> </sentence><sentence> </sentence> </paragraph> </speech> </quotation>

148 3 subscript %inlinetext; DTD <!ELEMENT subscript (%inlinetext;)*> subscript <subscript> </subscript>

3.3 (, superscript) 149 superscript %inlinetext; DTD <!ELEMENT superscript (%inlinetext;)*> superscript notemarker <superscript> </superscript>

150 3 table br, sentence DTD <!ELEMENT table (br sentence)+> ( ) ( ) ( MONEY JAPAN 2003 12 ) ( TRENDY 2003 10 )

3.3 (, table) 151 list ( ) ( MONEY JAPAN 2003 12 ) table figure rejectedblock table figureblock table caption figureblock figureblock caption <figureblock> <caption> <sentence type="quasi"></sentence><br type="automatic_original"/> </caption> <table> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi"></sentence><br type="automatic_original"/> <sentence type="quasi"> ( </sentence><br type="automatic_original"/> <sentence></sentence> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi"></sentence><br type="automatic_original"/> <sentence>- </sentence> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi"></sentence><br type="automatic_original"/> <sentence> </sentence> <sentence type="quasi"> </sentence><br type="automatic_original"/> <sentence type="quasi">-</sentence><br type="automatic_original"/> <sentence></sentence> <sentence type="quasi"></sentence><br type="automatic_original"/> </table> </figureblock>

152 3 title titleblock br, notebody, rejectedblock, sentence DTD <!ELEMENT title (br notebody rejectedblock sentence)*> title titleblock title

3.3 (, title) 153 <cluster> <titleblock> <title><sentence type="quasi"></sentence></title> </titleblock> <cluster> <titleblock> <title><sentence type="quasi"></sentence></title> </titleblock> <paragraph> <sentence>... </cluster> </cluster> title 1.1 cluster cluster title cluster title title 2003 6 17 <article articleid="pn3c_xxxxx_v0001" iswholearticle="true"> <titleblock> <title> <sentence type="quasi"> </sentence><br type="automatic_original"/> </title> </titleblock> <paragraph> <sentence> </sentence>

第 3 章 タグ仕様 154 title 要素の性質 title 要素が備えている性質としては 例3 例5 に示すようなものが挙げられる ³ ³ µ µ 例3 特定範囲の文書要素に対するトピック ( 短歌研究 2003 年 11 月号) 例4 特定範囲の文書要素に対する要約 ( 趣味の園芸 2003 年 8 月号) ³ µ 例5 特定範囲の文書要素からの一部抽出 ( ポップティーン 2003 年 9 月号) title 要素と titleblock 要素の認定 レイアウト上 本文とは切り離されており title 要素に付随する要素のように見えるものであっても 文章 の内容を判断する上で重要と認められるものについては title 要素とみなす 例えば 以下のようなもので ある 連載 枠 コーナーの名称 記事種を示す名称 article 要素 cluster 要素のトピックを表す表現 例6 は 連載コラムである この例における 主たる title 要素は 当該回のタイトル 精いっぱいやっ たよ うれしかった である 連載のタイトル 少年は大志を抱いた 第六回 は 紙面の表示からは 当該

3.3 (, title) 155 title 2003 11 (2003 11 ) <article articleid="pm32\_xxxxx\_0001" iswholearticle="true"> <titleblock> <title> <sentence type="quasi"></sentence> </title> </titleblock> <authorsdata> <sentence type="quasi"> </sentence> </authorsdata> <cluster> <titleblock> <title> <sentence type="quasi"> </sentence> </title> </titleblock> </cluster>

156 3 titleblock title br, list, rejectedblock, sentence, title DTD <!ELEMENT titleblock (br list rejectedblock sentence title)*> titleblock article cluster title title title title title title spring 2003 1 title titleblock title title

3.3 (, titleblock) 157 title 2003 10 title spring 2003 1 <titleblock> <sentence type="quasi"> </sentence> <title> <sentence> </sentence> </title> </titleblock> br title <titleblock> <title> <sentence type="quasi"> </sentence> </title> <sentence type="quasi"></sentence> </titleblock> title 2003 10 <titleblock> <sentence> </sentence><sentence> </sentence> <sentence type="quasi"><enclosedcharacter> </enclosedcharacter> </sentence> <title> <sentence type="quasi"></sentence> <sentence type="quasi"></sentence> </title> <sentence> </sentence> </titleblock>