x i 2 x x i i 1 i xi+ 1xi+ 2x i+ 3 健康児に本剤を接種し ( 窓幅 3 n-gram 長の上限 3 の場合 ) 文字 ( 種 )1-gram: -3/ 児 (K) -2/ に (H) -1/ 本 (K) 1/ 剤 (K) 2/ を (H) 3/ 接 (K) 文字 (

Similar documents
¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

jpaper : 2017/4/17(17:52),,.,,,.,.,.,, Improvement in Domain Specific Word Segmentation by Symbol Grounding suzushi tomori, hirotaka kameko, takashi n

A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

( : A9TB2096)

1. はじめに 2

Vol. 22 No. 2 June 2015 and language expressions. Based on these backgrounds, in this study, we discuss the definition of a tag set for recipe terms a

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

自然言語処理24_705

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i


gengo.dvi

pp DC 2,

IPSJ SIG Technical Report Vol.2010-CVIM-170 No /1/ Visual Recognition of Wire Harnesses for Automated Wiring Masaki Yoneda, 1 Ta

計量国語学 アーカイブ ID KK 種別 特集 招待論文 A タイトル Webコーパスの概念と種類, 利用価値 語史研究の情報源としてのWebコーパス Title The Concept, Types and Utility of Web Corpora: Web Corpora as

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

29 jjencode JavaScript

-like BCCWJ CD-ROM CiNii NII BCCWJ BCCWJ

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

untitled

johnny-paper2nd.dvi

2014/1 Vol. J97 D No. 1 2 [2] [3] 1 (a) paper (a) (b) (c) 1 Fig. 1 Issues in coordinating translation services. (b) feast feast feast (c) Kran

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

NINJAL Project Review Vol.3 No.3

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

2 except for a female subordinate in work. Using personal name with SAN/KUN will make the distance with speech partner closer than using titles. Last

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

soturon.dvi

Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

自然言語処理21_249

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

レビューテキストの書き の評価視点に対する評価点の推定 29 3

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

DEIM Forum 2010 A Web Abstract Classification Method for Revie

,,.,,.,..,.,,,.,, Aldous,.,,.,,.,,, NPO,,.,,,,,,.,,,,.,,,,..,,,,.,

コーパスに基づく言語学教育研究報告 8

Q-Learning Support-Vector-Machine NIKKEI NET Infoseek MSN i

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

4.1 % 7.5 %

3_23.dvi

36

評論・社会科学 84号(よこ)(P)/3.金子

2013 Future University Hakodate 2013 System Information Science Practice Group Report biblive : Project Name biblive : Recording and sharing experienc

,,,,., C Java,,.,,.,., ,,.,, i

09_加藤_紀要_2007

IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

% 95% 2002, 2004, Dunkel 1986, p.100 1

36 Theoretical and Applied Linguistics at Kobe Shoin No. 20, 2017 : Key Words: syntactic compound verbs, lexical compound verbs, aspectual compound ve

202

corpus.indd

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

授受補助動詞の使用制限に与える敬語化の影響について : 「くださる」「いただく」を用いた感謝表現を中心に

2 : Open Clip Art Library [4] Microsoft Office PowerPoint Web PowerPoint 2 Yahoo! Web [5] SlideShare Yahoo! Web Yahoo! Web

<95DB8C9288E397C389C88A E696E6462>

BOK body of knowledge, BOK BOK BOK 1 CC2001 computing curricula 2001 [1] BOK IT BOK 2008 ITBOK [2] social infomatics SI BOK BOK BOK WikiBOK BO

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGIN

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

Kyushu Communication Studies 第2号

A pp CALL College Life CD-ROM Development of CD-ROM English Teaching Materials, College Life Series, for Improving English Communica

24 Region-Based Image Retrieval using Fuzzy Clustering

kut-paper-template.dvi

DEIM Forum 2019 C3-5 tweet

16_.....E...._.I.v2006

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

28 Horizontal angle correction using straight line detection in an equirectangular image

人工知能学会研究会資料 SIG-KBS-B Analysis of Voting Behavior in One Night Werewolf 1 2 Ema Nishizaki 1 Tomonobu Ozaki Graduate School of Integrated B

IPSJ SIG Technical Report Vol.2009-BIO-17 No /5/26 DNA 1 1 DNA DNA DNA DNA Correcting read errors on DNA sequences determined by Pyrosequencing

IPSJ SIG Technical Report Vol.2011-DBS-153 No /11/3 Wikipedia Wikipedia Wikipedia Extracting Difference Information from Multilingual Wiki

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing


3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

11_寄稿論文_李_再校.mcd

16

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

IPSJ SIG Technical Report Vol.2009-HCI-134 No /7/17 1. RDB Wiki Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visibl

<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML

fiš„v8.dvi

音響モデル triphone 入力音声 音声分析 デコーダ 言語モデル N-gram bigram HMM の状態確率として利用 出力層 triphone: 3003 ノード リスコア trigram 隠れ層 2048 ノード X7 層 1 Structure of recognition syst


HP cafe HP of A A B of C C Map on N th Floor coupon A cafe coupon B Poster A Poster A Poster B Poster B Case 1 Show HP of each company on a user scree

HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus


2) 3) LAN 4) 2 5) 6) 7) K MIC NJR4261JB0916 8) 24.11GHz V 5V 3kHz 4 (1) (8) (1)(5) (2)(3)(4)(6)(7) (1) (2) (3) (4)

Hansen 1 2, Skinner 5, Augustinus 6, Harvey 7 Windle 8 Pels 9 1 Skinner 5 Augustinus 6 Pels 9 NL Harvey ML 11 NL

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

Transcription:

1. 2 1 NEUBIG Graham 1 1 1 Improving Part-of-Speech Tagging by Combining Pointwise and Sequence-based Predictors Yosuke NAKATA, 1 Graham NEUBIG, 1 Shinsuke MORI 1 and Tatsuya KAWAHARA 1 This paper proposes an approach to part-of-speech sequence reranking based on POS transition tendencies fot the result of morphological analysis with pointwise predictors. Pointwise prediction uses as its feature set only surface information about the surrounding character strings, without relying on predicted information such as surrounding POS tags or word boundaries. This allows for the flexible use of a variety of linguistic resources, making it possible to achieve domain adaptation with a minimum amount of annotation. But pointwise prediction cannot use POS transition information that is important in POS prediction. It can be assumed that the transition tendencies of POSs are not highly domain dependent, transition information learned in one domain can be used in another domain. By applying POS sequence reranking that considers POS transition information to the result of pointwise predictors, we were able to achieve an improvement in POS tagging accuracy. 1) 2 1 2 1 Kyoto University, School of Informatics 1 2),3) 1 c 211 Information Processing Society of Japan

x i 2 x x i i 1 i xi+ 1xi+ 2x i+ 3 健康児に本剤を接種し ( 窓幅 3 n-gram 長の上限 3 の場合 ) 文字 ( 種 )1-gram: -3/ 児 (K) -2/ に (H) -1/ 本 (K) 1/ 剤 (K) 2/ を (H) 3/ 接 (K) 文字 ( 種 )2-gram: -3/ 児に (KH) -2/ に本 (HK) -1/ 本剤 (KK) 1/ 剤を (KH) 2/ を接 (HK) 文字 ( 種 )3-gram: -3/ 児に本 (KHK) -2/ に本剤 (HKK) -1/ 本剤を (KH) 1/ 剤を接 (KHK) 単語辞書素性 : L1( 本 ), R1( 剤 ), I2( 本剤 ) 2. 1 t 1) SVM 4) 2.1 5) x = x 1 x 2 x n t = t 1t 2 t n 1 t i x i x i+1 2 2 3 ( 1 ) n-gram: i m n 2m x i m+1,, x i 1, x i, x i+1,, x i+m n n-gram 1 ( 2 ) n-gram: n-gram KkH R NO 6 ( 3 ) : i x 3 x 2 x 1 w x1 x2 x3 健康児に本剤を接種し ( 窓幅 3 n-gram 長の上限 3 の場合 ) 文字 ( 種 )1-gram: -3/ 康 (K) -2/ 児 (K) -1/ に (H) 1/ を (H) 2/ 接 (K) 3/ 種 (K) 文字 ( 種 )2-gram: -3/ 康児 (KK) -2/ 児に (KH) -1/ にを (HH) 1/ を接 (HK) 2/ 接種 (KK) 文字 ( 種 )3-gram: -3/ 康児に (KKH) -2/ 児にを (KHH) -1/ にを接 (HHK) 1/ を接種 (HKK) 2.2 2 4 ( 1 ) ( 2 ) 1 ( 3 ) ( 4 ) one v.s. rest w x x + w m x m x 2x 1, w, x 1x 2 x m w w 2 ( 1 ) x x + n-gram ( 2 ) x x + n-gram 2.3 ( 1 ) : 2 c 211 Information Processing Society of Japan

( 2 ) : 3. 2 3.1 3.2 2.2 3 r r 1 d r r C r = d r d 2 1 L2 1 2 3 健康 名詞 1 児 名詞.897814 接尾辞 3 に 助詞 2.23378 助 -.167628 本剤 名詞 を 助 1.3772 助詞 接種 名詞 1 し 2.23378 助詞 助 -.246451 2.2 4 1 2.2 2 3 1 3 3.3 CRF 6) CRF 3 3.4 CRF 3 c 211 Information Processing Society of Japan

3 T T 3T 1 1 T : 2 T+1 2T : 3 2T+1 3T : 1 1 1 1 2 3 1 ( 1 ) m n-gram ( 2 ) m n-gram 2.1 6 2 6 1 n-gram n 3.5 CRF 4 ( 1 ) k C 1, C 2,..., C k ( 2 ) i C i k 1 C i i 1, 2,..., k C 1, C 2,..., C k CRF 3.6 5 学習コーパス ( 単語境界 品詞のフルアノテーションコーパス ) 1 番目の1/3 2 番目の1/3 3 番目の1/3 点予測による形態素解析 1 番目の 1/3 信頼度付きコーパス 4 学習 点予測による形態素解析 2 番目の1/3 信頼度付きコーパス学習 系列予測による品詞のリランキング 学習 テスト 点予測による形態素解析 3 番目の 1/3 信頼度付きコーパス k = 3 GTF 7),8) 6 6 3 3 3 9) 3 3 : - : : - / GWF AWF: 4 c 211 Information Processing Society of Japan

一般分野 (G) 適応分野 (A) 単語境界 (W) 単語境界品詞 (T) 単語境界 (W) 単語境界品詞 (T) フルアノテーション (GWF) 部分的アノテーション (GWP) フルアノテーション (GTF) 部分的アノテーション (GTP) フルアノテーション (AWF) 部分的アノテーション (AWP) フルアノテーション (ATF) 部分的アノテーション (ATP) 理論的に利用可能なコーパスは破線と実線の矢印であり 現実的に利用可能はコーパスは実線の矢印である 5 GWP AWP: 解析対象 点予測による単語境界推定 点予測による品詞推定 系列予測による品詞のリランキング 形態素解析済みコーパス GTF ATF: GTP ATP: GWP GWF AWP AWF GTF ATF GTF AWP ATP AWF ATF G F : - - - W P : - F : - / - / / - / / /T P : - / A F : - - W P : - F : / - / / - / / /T P : - / 6 5 4. 2 1 1 n-gram n 2 m 5 9 CRFsuite 1) 4.1 BCCWJ 8) 1 21 Yahoo! Yahoo! 11) Yahoo! 1 4.2 12) 1 29 5 c 211 Information Processing Society of Japan

1 2 27,338 782,584 1,131,317 3,38 87,458 126,154 BCCWJ Yahoo! 5,8 114,265 158, 645 13,18 17,98 BCCWJ N REF N SY S N COR N COR/N REF N COR /N SY S / / / / / / / / / / / // / N COR = 3 6 5 N REF = 6, N SY S = 5 N COR/N REF = 3/6 N COR/N SY S = 3/5 F 2 / + 4.3 1 1) CRF MeCab-.98 13) n-gram n=2,3 14) 2-gram HMM 15) 4 5 GTF CRF 3.5 2 3 5 [%] [%] F [%] [%] F 2-gram HMM 96.32 96.84 96.58 93.77 94.27 94.2 2-gram 97.44 98.52 97.98 96.58 97.65 97.11 3-gram 97.49 98.53 98. 96.7 97.73 97.21 CRF MeCab-.98 97.19 98.3 97.74 96.72 97.84 97.28 KyTea-.1.1 98.73 98.71 98.72 98.7 98.6 98.6 98.73 98.71 98.72 98.38 98.37 98.38 3 Yahoo! [%] [%] F [%] [%] F 2-gram HMM 93.17 94.44 93.8 86.78 87.96 87.36 2-gram 94.52 96.65 95.57 92.1 94.9 93.4 3-gram 94.52 96.71 95.6 92.1 94.24 93.16 CRF MeCab-.98 94.89 96.87 95.87 93.69 95.65 94.66 KyTea-.1.1 96.93 97.26 97.9 95.19 95.51 95.35 96.93 97.26 97.9 95.86 96.18 96.2 3 4.4 2 1) Pointwise:part CRF 7 ( 1 ) 5 GTF ( 2 ) ( 3 ) 1 6 c 211 Information Processing Society of Japan

一般分野の学習コーパス ( フルアノテーションコーパス ) 適応分野の学習コーパス ( 部分的アノテーションコーパス ) 97.1 Pointwise+CRFsuite:part 適応分野のテストコーパス 1. 学習 点予測による形態素解析器 系列予測による品詞再推定器 3. 情報の追加 F 値 96.6 96.1 95.6 95.1 Pointwise:part CRF:part 解析結果 ( 評価対象 ) 1 箇所の人手によるアノテーション 94.6 2 4 6 8 1 12 14 16 18 2 アノテーション形態素数 ( 1) 7 8 5 ATP 13 2 CRF CRF MeCab-.98 CRF part 8 8 5. 2 1) Neubig, G. 198 (NL198) (21). 2) pp.29 3 (21). 3) N-best Vol.51, No.8, pp.1443 1451 (21). 4) Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. and Lin, C.-J.: LIBLINEAR: A Library for Large Linear Classication, Journal of Machine Learning Research, Vol.9, pp.1871 1874 (28). 5) Neubig, G. 16 (21). 6) Lafferty, J., McCallum, A. and Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the Eighteenth ICML (21). 7 c 211 Information Processing Society of Japan

7) 3 pp.115 118 (1997). 8) KOTONOHA Vol.4, No.1, pp.82 95 (28). 9) Mori, S. and Oda, H.: Automatic Word Segmentation using Three Types of Dictionaries, Proceedings of the Eighth International Conference Pacific Association for Computational Linguistics (29). 1) Okazaki, N.: CRFsuite: a fast implementation of Conditional Random Fields (CRFs) (27). 11) Maekawa, K., Yamazaki, M., Maruyama, T., Yamaguchi, M., Ogura, H., Kashino, W., Ogiso, T., Koiso, H. and Den, Y.: Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese, Proceedings of the Seventh International Conference on Language Resources and Evaluation (21). 12) EDR EDR pp.49 56 (1995). 13) Conditional Random Fields. Vol.24, No.47, pp.89 96 (24). 14) Vol.5, No.2, pp.75 13 (1998). 15) Nagata, M.: A Stochastic Japanese Morphological Analyzer Using a Forward-DP Backward-A N-Best Search Algorithm, Proceedings of the 15th International Conference on Computational Linguistics, pp.21 27 (1994). 8 c 211 Information Processing Society of Japan