Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

Similar documents
1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

橡最終原稿.PDF

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

3_23.dvi

08-特集04.indd

7,, i

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

untitled

Vol. 48 No. 3 Mar PM PM PMBOK PM PM PM PM PM A Proposal and Its Demonstration of Developing System for Project Managers through University-Indus

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

06’ÓŠ¹/ŒØŒì

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

[2] , [3] 2. 2 [4] 2. 3 BABOK BABOK(Business Analysis Body of Knowledge) BABOK IIBA(International Institute of Business Analysis) BABOK 7

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information


IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth

05_藤田先生_責

〈論文〉英語学習辞書における二重母音と三重母音の発音表記の異同

大学における原価計算教育の現状と課題

<95DB8C9288E397C389C88A E696E6462>

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i



& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

udc-2.dvi

植物23巻2号

_念3)医療2009_夏.indd

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

1 1 tf-idf tf-idf i


Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

16_.....E...._.I.v2006

NINJAL Project Review Vol.3 No.3

.,,, [12].,, [13].,,.,, meal[10]., [11], SNS.,., [14].,,.,,.,,,.,,., Cami-log, , [15], A/D (Powerlab ; ), F- (F-150M, ), ( PC ).,, Chart5(ADIns

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

log F0 意識 しゃべり 葉の log F0 Fig. 1 1 An example of classification of substyles of rap. ' & 2. 4) m.o.v.e 5) motsu motsu (1) (2) (3) (4) (1) (2) mot

2 122


DT pdf

Web Web Web Web Web, i

untitled

IT,, i

2 ( ) i

企業の信頼性を通じたブランド構築に関する考察

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

126 学習院大学人文科学論集 ⅩⅩⅡ(2013) 1 2

_Y05…X…`…‘…“†[…h…•

,,,,., C Java,,.,,.,., ,,.,, i

fiš„v8.dvi

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking


¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

GPGPU

A B C B C ICT ICT ITC ICT

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

soturon.dvi

22SPC4報告書

企業内システムにおけるA j a x 技術の利用

FIG 7 5) 7 FIG ) 7) 8) 9) 10) 11) 12) 3 18 Gymnastik 13) 1793 J. Ch. F. Guts Muths Gymnastik fuer die Juegend 1816 F. L. Jahn Turnkunst Rhythm

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo

ï\éÜA4*

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

28 TCG SURF Card recognition using SURF in TCG play video

総研大文化科学研究第 11 号 (2015)

1. Database&Logic Word/Excel/PPT/PDF&Web Ultimate Dictionary 4. Jukkou&Rewrite 5. Convenience&Safety 6. Chinese&Korean 7. Support&Consultation 8

News_Letter_No35(Ver.2).p65

3_39.dvi

22 Google Trends Estimation of Stock Dealing Timing using Google Trends

WASEDA RILAS JOURNAL

, (GPS: Global Positioning Systemg),.,, (LBS: Local Based Services).. GPS,.,. RFID LAN,.,.,.,,,.,..,.,.,,, i

システム開発プロセスへのデザイン技術適用の取組み~HCDからUXデザインへ~

( )

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

9(2007).ren

Takens / / 1989/1/1 2009/9/ /1/1 2009/9/ /1/1 2009/9/30,,, i

IT i


28 Horizontal angle correction using straight line detection in an equirectangular image

1 UD Fig. 1 Concept of UD tourist information system. 1 ()KDDI UD 7) ) UD c 2010 Information Processing S

OJT Planned Happenstance

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

2

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan


Core Ethics Vol. -

橡 PDF

Kansai University of Welfare Sciences Practical research on the effectiveness of the validation for the elderly with dementia Naoko Tsumura, Tomoko Mi

,,.,.,,.,.,.,.,,.,..,,,, i

MmUm+FopX m Mm+Mop F-Mm(Fop-Mopum)M m+mop MSuS+FX S M S+MOb Fs-Ms(Mobus-Fex)M s+mob Fig. 1 Particle model of single degree of freedom master/ slave sy

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

在日外国人高齢者福祉給付金制度の創設とその課題

BS・110度CSデジタルハイビジョンチューナー P-TU1000JS取扱説明書

Transcription:

Vol. 42 No. 6 June 2001 IREX-NE F 83.86 A Japanese Named Entity Extraction System Based on Building a Large-scale and High-quality Dictionary and Pattern-matching Rules Yoshikazu Takemoto, Toshikazu Fukushima and Hiroshi Yamada We have developed a Named Entity extraction system from Japanese text. Named Entities, i.e., proper names and temporal/numerical expressions are considered as the essential elements for extracting information. The system employs a conventional method that it divides input Japanese text into words and parts of speech by morphological analysis and extracts each Named Entity by referencing dictionaries and applying pattern-matching rules. In order to improve the system s accuracy, we aim to build a large-scale and high-quality dictionary and rules. Both the dictionary and rules have been produced manually, because we believe that a hand-made dictionary or rules have better quality than those that are made automatically. We also focused our attention on two points for cases that cannot be covered by the dictionary. One is to extract proper names from compound words, and the other is to designate unknown or vague words as proper names. For the first point, our system divides compound words and determines proper names within them. Thus, omissions of proper names in compound words can be eliminated. For the second point, our system recognizes abbreviations of proper names, which tend to be unknown or vague, using reliable proper names. For the IREX-NE corpus, our system has accomplished 83.86 as F-measure score. 1. Information Services Department, Information Services Division, NEC Patent Service, Ltd. NEC Internet Systems Research Laboratories, Computer & Communication Media Research, NEC Corporation NEC Open Systems Development Department, 2nd Systems Operations Unit, NEC Corporation 1) 5) 1580

Vol. 42 No. 6 1581 1 MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC 2 3 4 5 IREX-NE 6 2. MUCMET IREX-NE IREX-NE 14) IREX-NE 8 IREX-NE 15) IREX-NE 8 SGML 1 15) 1 Table 1 Examples of named entity. ORGANIZATION NEC PERSON LOCATION ARTIFACT DATE 5 14 6 TIME 5 15 MONEY 500 1 PERCENT 120%5 1

1582 June 2001 24 24 3. 3.1 4 3.2 3.1 1 2 16) 3 1 1 4 1 3.2 3.2.1 1 2 9 3 3 2 / / /

Vol. 42 No. 6 1583 2 Table 2 Examples of rule. 2 1 / / 117 / 22 / 31 58 // 2 /// 2 2 10 ///// //4//1//1 4 1 1 1 / 3 2 / / / 22 31 58 2 2 2 117 / /4//1//1 IREX- NE /// // 10 F F 17) MET-2 3.2.2 3 3.1 + + 8) F 5 IREX

1584 June 2001 3.2.3 4 3.2.1 (a) (b) (c) (a) (a) (b) (c) (b) (c) (a) 2 / / 8) 18) 3.2.1 4. 1 5 4.1 4.5 4.6 4.1 1 Fig. 1 Our named entity extraction system configuration.

Vol. 42 No. 6 1585 Table 3 3 Details of named entity dictionary. 1 2 3 NEC 24990 36714 27564 Lavie 1343 1110 155 22 4 80 151 92133 + + 557 4.2 3 3.2.1 / / // / (1) (2) (3) 3 (3) (1) NEC (1) (2) (2) 3 3 /

1586 June 2001 / 4.3 3 19) 1 3 / / / (A) (B) (C) (A) / (B) / / (C) //// 8)11)12)18)20) 21) 4 1 1 4.4 4.5 4.4 8 4.6 4.14.5 2 Fig. 2 2 An example of our named entity extraction process.

Vol. 42 No. 6 1587 + + 11) 5. IREX-NE IREX-NE 1999 45 71 F F R P b (1) IREX-NE Table 4 4 Accuracy of our named entity extraction system. GLD SYS COR R P 361 373 288 79.8 77.2 338 324 290 85.8 89.5 413 387 339 82.1 87.6 48 20 15 31.3 75.0 260 275 242 93.1 88.0 54 60 47 87.0 78.3 15 15 13 86.7 86.7 21 17 16 76.2 94.1 1510 1471 1250 82.8 85.0 F 83.86 F = (1+b2 )PR b 2 P + R (1) 5.1 5.2 5.3 5.4 5.1 4 4 GLD SYS COR R GLD COR P SYS COR F R P (1) b=1 (2) F = 2PR P + R (2) 4 F 83.86 IREX-NE 4

1588 June 2001 Table 5 5 Evaluation of each component of our system. A A1 A2 A3 A4 B B1 B2 B3 B4 B5 B6 38.7 56.6 76.2 76.3 77.3 81.2 82.4 82.9 31.7 51.3 84.0 84.3 85.2 85.3 84.9 84.9 F 34.9 53.8 79.9 80.1 81.1 83.2 83.7 83.9 5.2 A1A4 A1 A2A1 A3A2 A4 A1 A2 A3 A4 A3 5 A 5 A 38.7% 31.7% 33.9%A2 A3 F A1 34.9 F 17.9% 19.6%F 18.9 A1 A2 5.3 B1 B6 B1 B2B1 B3B2 =A3 B4B3 B5B4=A4 B6B5 B1 B2 B3 B3 5.2 A3 B4 B5 B5 5.2 A4 B6 703 5 B 5B F 3.9%F 2.1 B3 B4 0.9% B2 B3 F 0.5 B4 B5 0.4% 1.2% F 0.2 B1 B2 /

Vol. 42 No. 6 1589 6 F Table 6 Accuracy improvement on training corpus. A A1 A2 A3 A4 B B1 B2 B3 B4 B5 B6 A 51.1 69.5 85.9 86.2 86.2 89.6 89.8 93.9 B 41.3 56.5 77.2 77.5 79.1 81.9 82.9 87.2 C 38.9 56.0 75.6 76.0 77.2 80.4 81.0 82.5 3 ABC F 6 A IREX-NE 46 B 1998 11 IREX-NE 36 C IREX-NE CRL 1460 1994 1995 AB C AB A B A 5.4 (1) (2) (1) (1) 5.1 1 1 (2) + + 3 78 6. IREX-NE F 83.86 1 90% 5.15.4 5.2 5.2 5 6

1590 June 2001 7. IREX-NE IREX-NE 83.86 F NEC MET-1 NEC 1) Vol.40, No.4 (1999). 2) 114-12 (1996). 3) Vol.36, No.8 (1995). 4) 115-12 (1996). 5) Cowie, J. and Lehnert, W.: Information Extraction, Comm. ACM, Vol.39, No.1 (1996). 6) Proc. 6th Message Understanding Conference (MUC-6 ), Morgan Kaufman Publishers Inc. (1996). 7) Proc. Tipster Text Program (Phase II ), DARPA (1996). 8) 115-10 (1996). 9) IREX 127-15 (1998). 10) Vol.25, No.6 (1984). 11) 35 6S-3 (1987). 12) Takemoto, Y., Wakao, T., Yamada, H., Gaizauskas, R. and Wilks, Y.: Description of NEC/Sheffield System Used For MET Japanese, Proc. Tipster Text Program (Phase II ) (1996). 13) IREX-NE IREX (1999). 14) IREX NE 5 B2-1 (1999). 15) http://cs.nyu.edu/cs/projects/proteus/irex/ 16) 5 A2-2 (1999). 17) Sekine, S.: NYU: Description of the Japanese NE System Used For MET-2 (1998). http://www.muc.saic.com/ 18) 92-90 (1992). 19) 3 (1997). 20) 53 7L-3 (1996). 21) 126-15 (1998). ( 12 3 1 ) ( 13 3 9 )

Vol. 42 No. 6 1591 1990 NEC 1962 1987 NEC 1982 NEC WWW 4 6 45 53 ACM