IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

Similar documents
A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

NINJAL Project Review Vol.3 No.3

自然言語処理16_2_45

JCLWorkshop_No.8

自然言語処理24_705

els08ws-kuroda-slides.key

Corrected Version NICT /11/15, 1 Thursday, May 7,

FUJII, M. and KOSAKA, M. 2. J J [7] Fig. 1 J Fig. 2: Motivation and Skill improvement Model of J Orchestra Fig. 1: Motivating factors for a

SEJulyMs更新V7

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1 1 tf-idf tf-idf i

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

IPSJ SIG Technical Report Vol.2011-DBS-153 No /11/3 Wikipedia Wikipedia Wikipedia Extracting Difference Information from Multilingual Wiki

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information


特-3.indd

11_寄稿論文_李_再校.mcd

36 Theoretical and Applied Linguistics at Kobe Shoin No. 20, 2017 : Key Words: syntactic compound verbs, lexical compound verbs, aspectual compound ve

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

IPSJ SIG Technical Report Vol.2014-IOT-27 No.14 Vol.2014-SPT-11 No /10/10 1,a) 2 zabbix Consideration of a system to support understanding of f

APU win-win

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

,,,,., C Java,,.,,.,., ,,.,, i


( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

09_加藤_紀要_2007

johnny-paper2nd.dvi

( )

2. Twitter Twitter 2.1 Twitter Twitter( ) Twitter Twitter ( 1 ) RT ReTweet RT ReTweet RT ( 2 ) URL Twitter Twitter 140 URL URL URL 140 URL URL

[1], B0TB2053, i

企業の信頼性を通じたブランド構築に関する考察

fiš„v8.dvi

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

2016

日本感性工学会論文誌

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

自然言語処理21_249

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool

モバイルネットワーク管理システム

1 UD Fig. 1 Concept of UD tourist information system. 1 ()KDDI UD 7) ) UD c 2010 Information Processing S

IT i

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

Perrett et al.,,,, Fig.,, E I, 76

Lytro [11] The Franken Camera [12] 2.2 Creative Coding Community Creative Coding Community [13]-[19] Sketch Fork 2.3 [20]-[23] 3. ourcam 3.1 ou

IPSJ SIG Technical Report Vol.2009-HCI-134 No /7/17 1. RDB Wiki Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visibl

3_23.dvi

Copyright SATO International All rights reserved. This software is based in part on the work of the Independen

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

07九州工業大学.indd

大学における原価計算教育の現状と課題

DEIM Forum 2009 B4-6, Str

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

log F0 意識 しゃべり 葉の log F0 Fig. 1 1 An example of classification of substyles of rap. ' & 2. 4) m.o.v.e 5) motsu motsu (1) (2) (3) (4) (1) (2) mot

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

Vol. 48 No. 3 Mar PM PM PMBOK PM PM PM PM PM A Proposal and Its Demonstration of Developing System for Project Managers through University-Indus

A Study of Effective Application of CG Multimedia Contents for Help of Understandings of the Working Principles of the Internal Combustion Engine (The

pp DC 2,

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS


P2P Web Proxy P2P Web Proxy P2P P2P Web Proxy P2P Web Proxy Web P2P WebProxy i

29 jjencode JavaScript

2 : Open Clip Art Library [4] Microsoft Office PowerPoint Web PowerPoint 2 Yahoo! Web [5] SlideShare Yahoo! Web Yahoo! Web

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

9_18.dvi

Windows7 OS Focus Follows Click, FFC FFC focus follows mouse, FFM Windows Macintosh FFC n n n n ms n n 4.2 2

Web Web Web Web Web, i

計量国語学 アーカイブ ID KK 種別 特集 招待論文 A タイトル Webコーパスの概念と種類, 利用価値 語史研究の情報源としてのWebコーパス Title The Concept, Types and Utility of Web Corpora: Web Corpora as

untitled

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

社会技術論文集

論文9.indd


36

<95DB8C9288E397C389C88A E696E6462>

橡最終原稿.PDF

いしずえ134.indd

fiš„v3.dvi

7,, i

21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G

IPSJ SIG Technical Report Vol.2017-SLP-115 No /2/18 1,a) 1 1,2 Sakriani Sakti [1][2] [3][4] [5][6][7] [8] [9] 1 Nara Institute of Scie

% 95% 2002, 2004, Dunkel 1986, p.100 1

はじめに

matsuda.dvi

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

100 SDAM SDAM Windows2000/XP 4) SDAM TIN ESDA K G G GWR SDAM GUI

Transcription:

1. 1 1 1 2 treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corpus Management Tool: ChaKi Yuji Matsumoto, 1 Masayuki Asahara, 1 Masakazu Iwatate 1 and Toshio Morita 2 This paper introduces a annotated corpus management system ChaKi that has been developed under the auspices of the Japanese Corpus Project (Grantin-Aid for Scientific Research in Priority Areas). The system handles morphologican and dependency structure annotated corpora and facilitates various functions such as storing, retrieving, creating and error-correcting annotated corpora. String, word and dependency structure based corpus retrievals are possible, and the results are shown as KWIC format or as dependency trees. While the current system transfers corpora with the ChaSen/MeCab or CaboCha output format into databases, it is language independent and can be applied flexibly to any POS/dependency structure annotated corpora. Penn Treebank 1) 2) WordSmith 1 KWIC concordancer 2 100 1 ({matsu,masayu-a,masakazu-i}@is.naist.jp) Nara Institute of Science and Technology 2 (morita@sowa.com) Sowa Giken Corp. 1 http://www.lexically.net/wordsmith/version5/index.html 2 ( )( 2006 2010 ) 1 c 2010 Information Processing Society of Japan

Fig. 1 1 Configuration of ChaKi 2 Fig. 2 Internal Structure of ChaKi Slate 3) Client-Server 2. 1 2 NAIST-jdic UniDic 4) 3 MeCab 4 5 (Lexicon) 5) Visual C++ Ruby Microsoft.NET Framework/C# ChaKi.NET 3 http://chasen-legacy.sourceforge.jp/ 4 http://sourceforge.net/projects/mecab/ 5 http://sourceforge.net/projects/cabocha/ DB SQLite Client-Server RDB MySQL, SQL- Express, PostgreSQL GUI (Search) DependencyEdit 2 c 2010 Information Processing Society of Japan

4 Fig. 4 Types and examples of search queries Fig. 3 3 Snap Shot of ChaKi in Use SQLite Slate 3.2 3. SQLite : 3 - : KWIC(Keywords in Context) 4 (0,0) 3.1 4 3 c 2010 Information Processing Society of Japan

Fig. 5 5 Sample of annotation with dependency, apposition and coordination 6 Fig. 6 Sample of embedded structure : 4 - - 7 Fig. 7 Sample of truncated embedded structure 3 KWIC WordList 3.3 3 6 KWIC Nest KWIC 7 5 3.4 D 3 4 c 2010 Information Processing Society of Japan

: KWIC 3 UniDic 9 : 3 8 Fig. 8 Flat display of dependency structure and full description of lexical entries 4. : 3 MeCab : 3.5 5 8 KWIC : KWIC window 5, 6 N-gram : KWIC N-gram N 5. N-gram(Right) Minimum Frequency 5 Minimum Length 4 KWIC 4 5 N-gram : KWIC 5 c 2010 Information Processing Society of Japan

UTF-16 5) Yuji Matsumoto, et al: An Annotated Corpus Management Tool: ChaKi, Proceedings of the 5th International Conference on Language Resources and Evaluation, Tagalog (2006). punctuation mark http://sourceforge.jp/projects/chaki/ 1) Marcus, M.P.Santorini, B.and Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, Vol. 2, No. 2, pp.313 330, (1993). 2) Version 4.0: http://nlp.kuee.kyoto-u.ac.jp/nl-resource/corpus.html 3) Dain Kaplan Slate,, (2010). 4),,,,,, :,, 22, pp.101 122, (2007). 6 c 2010 Information Processing Society of Japan