JAIST Reposi https://dspace.j Title ウェブページからのサイト情報 作成者情報の抽出 Author(s) 堀, 達也 Citation Issue Date 2015-09 Type Thesis or Dissertation Text version author URL http://hdl.handle.net/10119/12932 Rights Description Supervisor: 白井清昭, 情報科学研究科, 修士 Japan Advanced Institute of Science and
2015 9
1310067 : 2015 8 Copyright c 2015 by Hori Tatsuya 2
,.,.,,.,.,.,,,.,,,.,,,.,., (, ), (,,, ).,,.,,,.,.,.,,. Kato,, DOM,. Giuffrida,,,.,, ( ),,,, ( ).,,., HTML Document Object Model (DOM),. Support Vector Machine (SVM), DOM, id, class,,, DOM,,, n-gram.,, ( DOM ) ( DOM )
., DOM,,..,.,,,..,, DOM., Kato., 0. 0 DOM,,.,,.. 500,., 10,, F.,,. F, 0.384, 0.258, F, 0.585, 0.675.,,,.,,,., F,.,.,,., F.,.,,.,.,.,,.,,. 2
1 1 1.1................................... 1 1.2................................... 1 1.3.................................. 2 2 3 2.1......................... 3 2.2............................ 13 3 15 3.1....................................... 15 3.2................................... 16 3.3....................................... 18 3.4............................. 22 3.4.1................... 23 3.4.2 0.............. 23 3.4.3 2.................. 24 4 26 4.1................................... 26 4.2.................................. 26 4.3..................................... 27 4.4................................. 28 4.4.1............................. 28 4.4.2............................... 32 4.4.3......................... 37 4.5................................... 38 5 42 5.1................................. 42 5.2................................... 42 44 i
A 100 n-gram 46 ii
1 1.1,..,.,.,.,,,.,,,.,,,.,.,. 1.2,., (, ), (,,, ).,,.,,,.,,,.,,,.,. 1
1.3 5. 2,. 3,. 4,,.,. 5,. 2
2,. 2.1,, ( ). 2.2,. 2.1 2.1: [1],, [1]., 2.1,,, DOM.,.. X 3
Y DOM HTML DOM DOM, 10 10 5 5, 5. 2.1.,,.,, DOM. 2.1: 1 [1] 10 10 5 5 Precision Recall Precision Recall 0.21 0.52 0.48 0.68,, DOM.. Juman 4
,, 10 10 2. 2.2., 1, 2. 2.2: 2 [1] Precision Recall 1 0.53 0.47 1 0.84 0.47 Kato,, [2]. Kato 2.2.,.. 1. HTML. 2.. 3. KNP( ). 4.. (a) (b) ( ) 5.. (a). (b). (c) ( ) ( ). 6...,. HTML (, ) 5
2.2: Kato [2], 2.3. author name content, h1 div, 2 h1-div-table-tbody-tr-td, 5. SVM. SVM,., ABC,. 2 : ABC 1 : ABC 0 k,., 2.3. All,. 58.6%.,, k. 2.3: [2] k Ranking Precision 1 0.586 3 0.720 5 0.752 All 0.847 6
2.3: DOM [2] Giuffrida, PostScript,,,, [3].,., Giuffrida,. Giuffrida. 1. xy. 2. xy. 3. -,. 4... 2.4. 7
2.4: [3] 9 12 10 10 8 2.5: [3] Accuracy 92% 87% 75% 71% 76% 2.5.,,. Kawahara, [4].,,,. Kawahara. 1. Web,. 1). Web.. 2). JUMAN KNP,. 3). Web,. 2.,. 8
. :.. :. 3... ( ), ). :.. :.,, 25. 2.4.,. A). B)., C).,. 9
2.4: [4] 2.5. Kawahara 2.6. major p-a, contradictions., A, B, 82.5%, 79.3%,. Kobayashi, [5]. Kobayashi, ( ). opinion holder subject ( ) 10
2.5: [4] : aspect,,, Subject evaluation Opinion holder / :,,,, Asp-Eval, Asp-of. Asp-Eval aspect evaluation, Asp-of, aspect.,, subject,, aspect, Asp-of. 2.6.,., 1. 11
2.6: Kawahara [4] major p-a contradictions relevant(a, B) 160/194 (82.5%) 46/58 (79.3%) relevant(a) 118/194 (60.8%) 39/58 (67.2%) should be merged(b) 42/194 (21.6%) 7/58 (12.1%) not relevant(c) 34/194 (17.5%) 12/58 (20.7%) 2.6: [5] ( (Rest) (Auto) (Phone) (Game)), 2.7. I, II aspect., other aspect 3, Non-writer op. holder opinion holder. Asp-Eval, Asp-of.,, Aspect -ga VP-te Evaluation.,., aspect-aspect, aspect-evaluation., Asp-Eval Asp-of.,.. 12
2.7: [5] Rest Auto Phone Game articles 1,356 564 481 361 sentences 21,666 14,005 11,638 6,448 # of opinion units 4,267 1,519 1,518 775 I Asp-Eval 3,692 943 965 521 I Asp-Asp 1,426 280 296 221 I Subj-Asp 2,632 877 850 451 II Subj-Eval 575 576 553 243 II Subj-Asp-Eval 2,314 736 768 351 II Subj-Asp-Asp-Eval 1065 175 172 127 II other 313 32 25 54 Non-writer op. holder 95 17 22 2 1). evaluation( aspect),, aspect.,, 2). 2). evaluation( aspect) aspect. Kobayashi, Tateishi [6],., [7] Kobayashi,. 2.8, 2.9. A B Asp-of, B C Asp-of, A C Asp-of., 2.9 Asp-of. Asp-Eval,, 10%. Asp-of, 10%, 20%.,. 2.2, Kato, Giuffrida,.,, 13
2.8: Asp-Eval [5] Asp-Eval P 0.56 (432/774) R 0.53 (432/809) P 0.70 (504/723) 0.13 (46/360) R 0.62 (504/809) 0.17 (46/274) + P 0.72 (502/694) 0.14 (53/389) R 0.62 (502/809) 0.19 (53/274) 2.9: Asp-of [5] Asp-of precision recall 0.27 (175/682) 0.17 (175/1048) 0.44 (458/1047) 0.44 (458/1048) + 0.45 (474/1047) 0.45 (474/1048) ( ),,,, ( ).,,,. Kawahara, Kobayashi,,.,,,. 14
3 3.1, HTML Document Object Model (DOM),. Document Object Model, HTML. DOM, HTML ( ). DOM,, 1 HTML., HTML DOM. 3.1 HTML DOM. div h1 h2. h1 DOM 1, h2 DOM 2., DOM,. Support Vector Machine (SVM)[8] 3.1: HTML DOM 15
3.2 DOM. site DOM person DOM site-link, DOM person-link, DOM site-part DOM person-part DOM site-image, DOM person-image, DOM other, site person DOM 3.2.,,.,,,. 1 DOM. person-link DOM 3.3., DOM,.,,,, person-link. person-part DOM 3.4. Author:mirura., DOM., Author:mirura DOM ( ) person-part.,,.,,. 16
3.2: site person 3.3: person-link 17
3.4: person-part 3.3 node+infor. node, DOM., node DOM (N t ), N t (N p ), N t 1 (N s ), N t 1 (N ps ).,. node 3.5., infor. SVM, node infor 1, 0., infor. DOM HTML., site-link 18
3.5: node person-link HTML a, site HTML h1, h2. id, class id= title class= profile, id class,.,,,,., id= title-top title, top 2. DOM l, l [1, 20], [11, 30],..., [181, 200], l = 0, l > 200 1.,,.,, node N t. DOM.,,.,,,., N t 19
20, 3,.. 1. ChaSen 1. 2. 3 (N t 20 ) 3.1,. 3.1: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - N s N ps title ( ) 1., 4.1,., 3.6 3.2 DOM, h2 DOM, 1 ( h1 ) 3 title, 1. node N t. 1 http://chasen-legacy.osdn.jp/ 20
3.6: 1, DOM., 1, ( ) + ( ) + 1.,. node N t. ABOUT 1.,,. n-gram,.,,.,,,.,,.,, n-gram,. 1, DOM n-gram, 1., 2 2 http://blog.with2.net/ 21
2. n = 3, n-gram 100. n-gram 3.2., 100 n-gram A A.1, A.2. 3.2: n-gram n-gram 558,, 557,, 281,, 115,, 100,,, N t, N p, N s, N ps, N t. DOM 3.3. 3.3: DOM N t N p N s N ps DOM id, class n-gram 3.4, ( DOM ) ( DOM )., 4.1 4.1, 99%., SVM,., DOM,.. 22
3.4.1,,,,..,, DOM,., Kato [2]. 3.7.. 1. DOM ( body DOM ) 3. 2.. 3., t m,, 2.., DOM., t m = 0.5. 3.7, DOM.,,.,.,,.,., = 0.1, 0.2, 0.5, 4.1., = 0.1,.,, 0.1., img height, width.,, 100., ( title ) profile,.,.,,. 3.4.2 0 HTML DOM, ( 0 ). 0 DOM,, 3, DOM. 23
., 0,.,. 3.4.3 2 3.4.1,., 2. T,, I,, 24
3.7: (Kato et al. (2008) p.39 Figure 4) 25
4,,. 4.1,,. 4.2,. 4.3,. 4.4,,,. 4.5,,. 4.1,. Yahoo!,goo,FC2,, 500.,,. 4.1 DOM., site-part person-part, site person., site-image person-image,,, other. 500 10 ( D 1 D 10 ),., D 10., 50,.. 4.2. SVM,., D test. 4.2, DOM. 1. N s N ps, site. 26
4.1: DOM site 252 site-link 14 site-part* 17 site-image* 8 person 243 person-link 183 person-part* 35 person-image* 2 other 668386 4.2: DOM site 20 site-link 0 site-part* 1 site-image* 0 person 28 person-link 21 person-part* 3 person-image* 0 other 55424 2. N t about, a, site-link. 3. N s N ps profile, person. 4. N t profile, a, personlink. 5. other. 4.3,, HTML DOM,, F. (P), (R), F (F). P = DOM DOM R = DOM DOM F = 2 P R P + R (4.1) (4.2) (4.3) 27
P, R, F other, site, site-link, person, person-link,. 10, 10, 10., P, R, F. 4.4 4.4.1 4.3 4.6. 4.3, 10 10, 4.4,., 4.5 4.6,. 4.5 (10 ).,, 3.4.3 I.,. person-link F,., site, site-link, person,,., 4.2,., D 3, D 6, D 7, D 9, site-link DOM, site-link., 4.3 D 1 D 10 F. site, 0.235(D 4 ), 0.541(D 9 ), 0.306., site, site-link 0.253, person 0.3, person-link 0.21.,, ±0.1., site D 2, D 4, D 8, D 9, D 10, site-link D 10, person D 1, person-link D 9., site, ±0.1,.,. site-link, D 2., site-link. site, person, person-link F,, person-link, person, site.,,., 4.4, site 0.192, person 0.367, person-link 0.309., ±0.1, person D 1, D 4, D 5, D 6, D 10, person-link D 7, D 9, site., person, person-link,. 28
4.3: ( 10 ) site site-link person person-link D 1 0.246 0.067 0.027 1.000 D 2 0.153 0.077 0.146 0.714 D 3 0.300 0.161 0.909 D 4 0.148 0.063 0.143 0.857 D 5 0.311 0.125 0.217 0.933 D 6 0.194 0.132 0.909 D 7 0.214 0.225 0.950 D 8 0.392 0.059 0.219 0.733 D 9 0.411 0.230 1.000 D 10 0.388 0.250 0.157 0.733 site site-link person person-link D 1 0.600 0.500 0.462 0.692 D 2 0.565 0.333 0.583 0.714 D 3 0.643 0.600 0.625 D 4 0.571 1.000 0.480 0.545 D 5 0.704 1.000 0.670 0.700 D 6 0.633 0.474 0.556 D 7 0.682 0.714 0.704 D 8 0.667 1.000 0.694 0.579 D 9 0.793 0.742 0.750 D 10 0.765 0.667 0.706 0.667 F site site-link person person-link D 1 0.349 0.118 0.051 0.818 D 2 0.241 0.125 0.233 0.714 D 3 0.409 0.254 0.741 D 4 0.235 0.118 0.220 0.667 D 5 0.432 0.222 0.331 0.800 D 6 0.297 0.207 0.690 D 7 0.326 0.342 0.809 D 8 0.494 0.111 0.333 0.647 D 9 0.541 0.351 0.857 D 10 0.515 0.364 0.257 0.667 29
4.4: ( 10 ) site site-link person person-link D 1 0.667 0.611 0.917 D 2 0.556 1.000 0.750 0.792 D 3 0.667 0.824 0.929 D 4 0.478 0.692 0.857 D 5 0.684 0.917 0.850 D 6 0.720 0.882 0.867 D 7 0.579 0.895 0.963 D 8 0.750 0.629 0.762 D 9 0.810 0.692 0.952 D 10 0.792 1.000 0.750 site site-link person person-link D 1 0.480 0.423 0.846 D 2 0.435 0.333 0.500 0.905 D 3 0.571 0.560 0.813 D 4 0.524 0.360 0.545 D 5 0.481 0.759 0.850 D 6 0.600 0.789 0.722 D 7 0.500 0.607 0.963 D 8 0.500 0.611 0.842 D 9 0.586 0.581 1.000 D 10 0.559 0.765 0.833 F site site-link person person-link D 1 0.558 0.500 0.880 D 2 0.488 0.500 0.600 0.844 D 3 0.615 0.667 0.867 D 4 0.500 0.474 0.667 D 5 0.565 0.830 0.850 D 6 0.655 0.833 0.788 D 7 0.537 0.723 0.963 D 8 0.600 0.620 0.800 D 9 0.680 0.632 0.976 D 10 0.655 0.867 0.789 30
4.5: ( 10 ) F site 0.276 0.662 0.384 site-link 0.107 0.750 0.176 person 0.116 0.613 0.258 person-link 0.874 0.653 0.741 site 0.670 0.524 0.585 site-link 1.000 0.333 0.500 person 0.789 0.596 0.675 person-link 0.864 0.832 0.842 4.6: ( ) F site 0.320 0.762 0.451 site-link person 0.208 0.645 0.315 person-link 0.722 0.619 0.667 site 0.667 0.667 0.667 site-link person 0.750 0.677 0.712 person-link 0.840 1.000 0.913 31
, 4.5,., site, site-link, person.,., person-link,,. F,., 10,., 4.6,. site,.,. F,, site 0.216, person 0.397, person-link 0.346.,. 4.4.2 4.7: (, D test ) site site-link person person-link F tag 0.667 0.808 0.895 ( 0.000) (+0.058) (+0.055) F id,class 0.571 0.714 0.808 ( 0.096) ( 0.036) ( 0.032) F length 0.579 0.778 0.808 ( 0.088) (+0.028) ( 0.032) F bow 0.615 0.955 1.000 ( 0.052) (+0.205) (+0.160) F title 0.684 0.750 0.840 (+0.017) ( 0.000) ( 0.000) F sitekey 0.667 0.750 0.840 ( 0.000) ( 0.000) ( 0.000) F linkkey 0.667 0.750 0.840 ( 0.000) ( 0.000) ( 0.000) F n-gram 0.770 0.778 0.840 (+0.103) (+0.028) ( 0.000) F all 0.667 0.750 0.840 32
4.8: (, D test ) site site-link person person-link F tag 0.571 0.677 0.810 ( 0.096) ( 0.000) ( 0.190) F id,class 0.571 0.645 1.000 ( 0.096) ( 0.022) ( 0.000) F length 0.524 0.677 1.000 ( 0.143) (+0.010) ( 0.000) F bow 0.762 0.677 0.476 (+0.095) (+0.010) ( 0.524) F title 0.619 0.677 1.000 ( 0.048) (+0.010) ( 0.000) F sitekey 0.667 0.677 1.000 ( 0.000) (+0.010) ( 0.000) F linkkey 0.667 0.677 1.000 ( 0.000) (+0.010) ( 0.000) F n-gram 0.667 0.677 1.000 ( 0.000) (+0.010) ( 0.000) F all 0.667 0.667 1.000 4.9: (F, D test ) F site site-link person person-link F tag 0.615 0.737 0.895 ( 0.052) (+0.025) ( 0.018) F id,class 0.571 0.678 0.894 ( 0.096) ( 0.034) ( 0.019) F length 0.550 0.724 0.894 ( 0.117) (+0.012) ( 0.019) F bow 0.681 0.792 0.645 (+0.014) (+0.080) ( 0.268) F title 0.650 0.712 0.913 ( 0.017) ( 0.000) ( 0.000) F sitekey 0.667 0.712 0.913 ( 0.000) ( 0.000) ( 0.000) F linkkey 0.667 0.712 0.913 ( 0.000) ( 0.000) ( 0.000) F n-gram 0.683 0.724 0.913 (+0.016) (+0.012) ( 0.000) F all 0.667 0.712 0.913 33
4.10: (, D 10 ) site site-link person person-link F tag 0.900 0.889 0.846 (+0.108) ( 0.111) (+0.096) F id,class 0.762 0.703 0.750 ( 0.030) ( 0.297) ( 0.000) F length 0.818 1.000 0.750 (+0.026) ( 0.000) ( 0.000) F bow 0.821 0.864 1.000 (+0.029) ( 0.136) (+0.250) F title 0.833 1.000 0.750 (+0.041) ( 0.000) ( 0.000) F sitekey 0.833 1.000 0.750 (+0.041) ( 0.000) ( 0.000) F linkkey 0.783 1.000 0.750 ( 0.009) ( 0.000) ( 0.000) F n-gram 0.818 1.000 0.750 (+0.026) ( 0.000) ( 0.000) F all 0.792 1.000 0.750 4.11: (, D 10 ) site site-link person person-link F tag 0.529 0.706 0.611 ( 0.030) ( 0.059) ( 0.222) F id,class 0.471 0.765 0.833 ( 0.088) ( 0.000) ( 0.000) F length 0.529 0.735 0.833 ( 0.030) ( 0.030) ( 0.000) F bow 0.676 0.559 0.500 (+0.117) ( 0.206) ( 0.333) F title 0.588 0.765 0.833 (+0.029) ( 0.000) ( 0.000) F sitekey 0.588 0.765 0.833 (+0.029) ( 0.000) ( 0.000) F linkkey 0.529 0.765 0.833 ( 0.030) ( 0.000) ( 0.000) F n-gram 0.529 0.765 0.833 ( 0.030) ( 0.000) ( 0.000) F all 0.559 0.765 0.833 34
4.12: (F, D 10 ) F site site-link person person-link F tag 0.667 0.787 0.710 (+0.012) ( 0.080) ( 0.079) F id,class 0.582 0.732 0.789 ( 0.073) ( 0.135) ( 0.000) F length 0.643 0.847 0.789 ( 0.012) ( 0.020) ( 0.000) F bow 0.742 0.679 0.667 (+0.087) ( 0.188) ( 0.122) F title 0.690 0.867 0.789 (+0.035) ( 0.000) ( 0.000) F sitekey 0.690 0.867 0.789 (+0.035) ( 0.000) ( 0.000) F linkkey 0.632 0.867 0.789 ( 0.023) ( 0.000) ( 0.000) F n-gram 0.643 0.867 0.789 ( 0.012) ( 0.000) ( 0.000) F all 0.655 0.867 0.789 35
,., SVM, 1 SVM.,,, F,. F tag DOM, F id,class id, class, F length, F bow, F title, F sitekey, F linkkey, F n-gram n-gram., F all.,, I. F all 1,, F 4.7, 4.8, 4.9. () F all. F n-gram F all F, site, person F n-gram, person-link. F n-gram F all, n-gram. F all F id,class., id, class.,, site F length, person F id,class, person-link F bow., site, person id, class, person-link., D 10,. 4.10, 4.11, 4.12. F title F all, site F title, person, person-link., F sitekey F all, site F sitekey, person, person-link., F all, F title F sitekey,,. F all, F id,class F length. 2, person-link F, site, person F F id,class., id, class.,, site F id,class, person F bow, person-link F bow., site id, class, person person-link. n-gram D 10, D test., D test, D 10.,., n-gram., id, class,,,. 36
4.4.3 3.4. 4.1, DOM 668386, 3.4.3 T, 192161. DOM,., site 6, site-link 0, site-part 4, site-image 6, person 7, person-link 6, person-part 4, person-image 2. 71%, 5%. 3.4.3 I, DOM 196954., site 2, site-part 2, person 2, person-link 2, person-part 1, 0. 70%, 1%., T,,, F. 4.13. T,, person-link, site, person F T. person T, site,., T. T,, person-link, site, person F T., site, person, T.,, site, person F., I. D test, I, site DOM 20 5 (25%), person DOM 28 3 ( 11%)., T, site DOM 20 1 5%, person DOM., I T,., 3.4.1,., D 10. 4.14., T I person F. 3.4.1, 0.1, F, person F I 37
4.13: ( D test ) site site-link person person-link 0.682 0.778 0.840 T 0.762 0.808 0.840 I 0.667 0.750 0.840 site site-link person person-link 0.714 0.677 1.000 T 0.762 0.677 1.000 I 0.667 0.677 1.000 F site site-link person person-link 0.698 0.724 0.913 T 0.762 0.737 0.913 I 0.667 0.712 0.913 T., T, I, person F,., site T, I.,. D 10, I, site DOM 31 4 ( 13%), person DOM 32 2 ( 6%)., T, site DOM 31 1 ( 3%), person DOM. site person, person F site F. 4.5. 4.3, site,,.,,,. 4.1.,, 1, site., 38
4.14: ( D 10 ) site site-link person person-link 0.800 0.926 0.750 T 0.792 0.963 0.750 I 0.792 1.000 0.750 site site-link person person-link 0.588 0.735 0.833 T 0.559 0.765 0.833 I 0.559 0.765 0.833 F site site-link person person-link 0.678 0.820 0.789 T 0.655 0.852 0.789 I 0.655 0.867 0.789,, DOM other.,, n-gram,. 4.1:,. 4.2.,.,,.,,.,,.,.,. 3.4.1, DOM., 4.3 39
4.2:,,,.,,., DOM,., 4.4,,. DOM,, DOM,.,.,,. 40
図 4.3: コンテンツ領域検知の失敗例 1 図 4.4: コンテンツ領域検知の失敗例 2 41
5 5.1,., HTML DOM. SVM, 8.,,, SVM.,. DOM 8, 4., 10,.,., id, class, F,.,,,.,,.,.,,.,.,,. 5.2,,.,.,.,,., DOM, 100,., 100,., 42
,.,., 3.4,., 3.3,,, 4.4.2.,.,., 1 DOM.,, n-gram,.,,.,.,. 43
,,.,,.,,. 44
[1],,,. Web. 16, p.94-p.97, 2010. [2] Yoshikiyo Kato, Daisuke Kawahara, Kentaro Inui, Sadao Kurohashi and Tomohide Shibata. Extracting the Author of Web Pages. Proceedings of the 2nd ACM workshop on Information Credibility on the WICOW 08, p.35-p.42, 2008. [3] Giovanni Giuffrida, Eddie C. Shek, and Jihoon Yang. Knowledge-Based Metadata Extraction from PostScript Files. Proceedings of the Fifth ACM Conference on Digital Libraries(DL 00), p.77-p.84, 2000. [4] Daisuke Kawahara, Sadao Kurohashi, and Kentaro Inui. Grasping Major Statements and their Contradictions Toward Information Credibility Analysis of Web Contents. IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, p.393-p.397, 2008. [5] Nozomi Kobayashi, Kentaro Inui, and Yuji Matsumoto. Extracting Aspect- Evaluation and Aspect-of Relations in Opinion Mining. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, p.1065-p.1074, 2007. [6] K. Tateishi, T. Fukushima, N. Kobayashi, T. Takahashi, A. Fujita, K. Inui, and Y. Matsumoto. Web Opinion Extraction and Summarization Based on Viewpoints of Products, In IPSJ SIGNL Note 163, p.1-p.8, 2004. [7] Razvan Bunescu. Associative Anaphora Resolution: a Web Based Approach. In Proceedings of the EACL Workshop on the Computational Treatment of Anaphora, p.47-p.52, 2003. [8] Chih-Chung Chang and Chih-Jen Lin. LIBSVM : a Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, Vol.2, No.3, Article 27, 2011. 45
A 100 n-gram A.1: n-gram( 1 50 ) 3-gram 3-gram 2992,, 248,, 2138,, 242,, 1524,, 216,, 868,, 205,, 761,, 194,,! 588,, 191,, 557,, 169,,! 539,, 167 &, amp, ; 509,, 161,, 455,, 158,, 435,, 156,, 426,, 154,, 403,, 152,, 353,, 149,, 315,, 148,, 308,, 148,,! 281,, 145,, 281,, 143,, 266,, 139,, 264,, 138,, 262,, 137,, 261,, 133,, 261,, 130,, 260,, 130,, 260,, 128,, 46
A.2: n-gram( 51 100 ) 3-gram 3-gram 126,, 98,, 125,, 97,, 122,, 95,, 121,, 95,, 119,, 94,, 118,, 93,, 118,, 92,, 116,,? 90,, 115,, 89,, 114,, 88,, 114,, 88,, 112,, 88,, 111,, 86,, 110,, 86,, 110,, 85,, 107,, 82,,! 103,, 81,, 103,, 80,, 102,, 79,, 100,, 79,,! 100,,! 79,, 99,, 74,, 99,, 74,, 99,, 73,, 99,, 73,, 47