1 1 tf-idf tf-idf i

Similar documents
kut-paper-template.dvi

Web Web Web Web Web, i

, IT.,.,..,.. i

7,, i

13 RoboCup The Interface System for Learning By Observation Applied to RoboCup Agents Ruck Thawonmas

21 A contents organization method for information sharing systems

DEIM Forum 2009 E

soturon.dvi

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

IT i

29 jjencode JavaScript

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

16_.....E...._.I.v2006

Web Basic Web SAS-2 Web SAS-2 i

220 28;29) 30 35) 26;27) % 8.0% 9 36) 8) 14) 37) O O 13 2 E S % % 2 6 1fl 2fl 3fl 3 4

kut-paper-template2.dvi

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

2 ( ) i

,,,,., C Java,,.,,.,., ,,.,, i

29 Short-time prediction of time series data for binary option trade

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

IT,, i

Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

L3 Japanese (90570) 2008

23 The Study of support narrowing down goods on electronic commerce sites

4.1 % 7.5 %

先端社会研究 ★5★号/4.山崎

エンタープライズサーチ・エンジンQ u i c k S o l u t i o n ® の開発

P2P P2P Winny 3 P2P P2P 1 P2P, i

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

untitled

16−ª1“ƒ-07‘¬ŠÑ

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

kut-paper-template.dvi

) 2) , , ) 1 2 Q1 / Q2 Q Q4 /// Q5 Q6 3,4 Q7 5, Q8 HP Q9 Q10 13 Q11

i

i JR NPO NPO 18

,,,,,,,,,,,,,,,,,,, 976%, i

25 Removal of the fricative sounds that occur in the electronic stethoscope

A5 PDF.pwd

Core Ethics Vol.

卒業論文2.dvi

P2P Web Proxy P2P Web Proxy P2P P2P Web Proxy P2P Web Proxy Web P2P WebProxy i

„h‹¤.05.07


ï\éÜA4*

24 Region-Based Image Retrieval using Fuzzy Clustering

WebRTC P2P Web Proxy P2P Web Proxy WebRTC WebRTC Web, HTTP, WebRTC, P2P i

i


Wide Scanner TWAIN Source ユーザーズガイド

Web Web Web Web i

対朝鮮人絹織物移出と繊維専門商社の生産過程への進出

23 Study on Generation of Sudoku Problems with Fewer Clues

,,.,.,,.,.,.,.,,.,..,,,, i


126 学習院大学人文科学論集 ⅩⅩⅡ(2013) 1 2

thesis.dvi

Kansai University of Welfare Sciences Practical research on the effectiveness of the validation for the elderly with dementia Naoko Tsumura, Tomoko Mi

ISSN NII Technical Report Patent application and industry-university cooperation: Analysis of joint applications for patent in the Universit

Core Ethics Vol. -

28 Horizontal angle correction using straight line detection in an equirectangular image

1 Web Web 1,,,, Web, Web : - i -

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

paper.dvi

161 J 1 J 1997 FC 1998 J J J J J2 J1 J2 J1 J2 J1 J J1 J1 J J 2011 FIFA 2012 J 40 56

untitled

07_伊藤由香_様.indd

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

2011 Future University Hakodate 2011 System Information Science Practice Group Report Project Name Visualization of Code-Breaking Group Name Implemati


2006 3

B_01田中.indd

橡最新卒論


21 Key Exchange method for portable terminal with direct input by user

1 4 4 [3] SNS 5 SNS , ,000 [2] c 2013 Information Processing Society of Japan

04長谷川英伸_様.indd

untitled

浜松医科大学紀要

I 1) 2) 51 (1976) 6.9 ha 9 (1934) 2km 15, (1955) 6 (1620)

06’ÓŠ¹/ŒØŒì


SD SD

Q-Learning Support-Vector-Machine NIKKEI NET Infoseek MSN i

untitled

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004


Tsuken Technical Information 1

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

_念3)医療2009_夏.indd


25 D Effects of viewpoints of head mounted wearable 3D display on human task performance


授受補助動詞の使用制限に与える敬語化の影響について : 「くださる」「いただく」を用いた感謝表現を中心に

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a

卒業論文はMS-Word により作成して下さい

九州大学学術情報リポジトリ Kyushu University Institutional Repository 看護師の勤務体制による睡眠実態についての調査 岩下, 智香九州大学医学部保健学科看護学専攻 出版情報 : 九州大学医学部保健学

<95DB8C9288E397C389C88A E696E6462>

Transcription:

14 A Method of Article Retrieval Utilizing Characteristics in Newspaper Articles 1055104 2003 1 31

1 1 tf-idf tf-idf i

Abstract A Method of Article Retrieval Utilizing Characteristics in Newspaper Articles TOMOIKE Takayuki The concern about the text processing technology which takes out required information from huge information is increasing now. Technical research is carried out from various viewpoints, such as question answering and text summarization. This paper describes a document retrieval method which is part of question answering system, utilizing characteristics in newspaper article. The retrieval method aims at retrieving document from newspaper articles. The examples of the characteristics in newspaper article are the first sentence of article has a conclusion in many cases, the first sentence of each paragraph is important in many cases and the name of a person to which an executive and age were attached are important in many cases. The retrieval method is based on tf-idf weighting. However, it is known that there is a problem in the tf-idf weighting. When there is a long document in newspaper articles, it will be retrieved preferentially as compared with a short one. This paper describes the problem solution method which uses text summarization technique too. key words Question Answering, Information Retrieval, tf-idf Weighting, Text Summarization ii

1 1 2 3 2.1.................................. 3 2.1.1................................ 3 2.1.2............................... 4 2.1.3 tf-idf......................... 4 2.1.4........................... 5 2.2................................... 6 2.2.1 NTCIR................................ 6 2.2.2 QAC-1................................. 6 2.3................................. 8 3 10 3.1.......................... 10 3.2........................... 11 3.3................................... 14 3.4................................... 16 3.5...................................... 16 4 18 4.1................................... 18 4.2................................. 19 4.3...................................... 22 iii

5 26 6 28 30 31 A 32 B 3.5 35 C 4.3 38 iv

2.1.......................... 4 2.2 QAC-1 2................ 8 3.1.............................. 11 3.2............................. 13 4.1....................... 19 4.2........... 24 5.1...................... 27 v

3.1........................ 12 3.2............................. 12 3.3.......................... 14 3.4................................. 15 3.5...................... 16 4.1....................... 23 4.2............ 25 A.1 1.......................... 32 A.2 2.......................... 33 A.3 3.......................... 34 B.1 3.5 1............................. 35 B.2 3.5 2............................. 36 B.3 3.5 3............................. 37 C.1 4.3 1............................. 38 C.2 4.3 2............................. 39 C.3 4.3 3............................. 40 vi

1 WWW WWW NTCIR [1] QAC-1[2] 3 NTCIR 1 QAC-1 RDB(relational database) QAC-1 2 1

QAC-1 2 2

2 2.1 2.1.1 [3] 1970 RDB 2.1 3 3

2.1 2.1 2.1.2 1 [4] 1 1 1 10 2.1.3 tf-idf tf-idf 4

2.1 [4] tf(term frequency) N df (document frequency) idf(inverse document frequency) ) idf = 1 + log tf-idf w ( N df w = tf idf = tf ( ( )) N 1 + log df (2.1) tf-idf 2.1.4 [4] Posum[5] Posum 5

2.2 2.2 2.2.1 NTCIR NTCIR (NII Test Collection for Information Retrieval and Text Processing: ) [1] NTCIR 3 2.2.2 QAC-1 QAC-1 3 NTCIR 1 [2] QAC-1 6

2.2 RDB 1998, 99 2 1 5 2 ( ) ( ) ( ) 3 QAC-1 2 [6] QAC-1 2 2.2 7

2.3 2.2 QAC-1 2 2.3 2.1.3 tf-idf 8

2.3 tf-idf 1 9

3 tf-idf Posum 3.1 QAC-1 3.1 3.1 3.1 5 10 20 tf-idf 10

3.2 3.1 3.2 Posum Posum 30 50 20 Posum 2 236,664 3.2 3.2 11

3.2 3.1 DOCNO LANG ID SECTION AE WORDS HEADLINE DATE TEXT 3.2 236,664 1 10.63 1 202 1 1 3.2 3.1 Posum 10 1 10 70% 20 15% 3 3.3 12

3.2 35000 30000 25000 The number of the articles 20000 15000 10000 5000 0 0 5 10 15 20 25 The number of the sentences which constitute an article 3.2 1 1 1 1 1 1 1 1 13

3.3 3.3 0 1 10 Posum 2 10 4 6 Posum 3 10 1 2 1 Posum 1 2 3.3 3.4 [7] 14

3.3 3.4 1 2 3 Who 4 Who 5 Who 6 Who 7 Who 8 Who 9 Who 1 1 15

3.4 3.4 1 tf N ) df idf idf = 1 + log tf-idf ( N df w w = tf idf = tf ( ( )) N 1 + log df (3.1) 3.1 3.5 3 3.5 0 1 2 3 7/43 5/43 4/43 10/43 (16.3%) (11.6%) (9.3%) (23.3%) 16

3.5 QAC-1 QAC-1 Formal Run 200 43 43 3.5 1, 2 3 17

4 tf-idf 4.1 tf-idf 3.2 3 4.1 3.3 18

4.2 4.1 4.2 1 Who 1 1 19

4.2 1 1 1 1 1 1 1 4.1 1 1 20

4.2 B 4.2 tf-idf tf N df ) B idf idf = 1 + log ( N df tf-idf w ( ( )) N w = B tf idf = B tf 1 + log df (4.1) 4.1 1. df 2. (tf ) =0 3. tf-idf 21

4.3 4.3 tf-idf 3.2 3 QAC-1 QAC-1 Formal Run 200 43 43 QAC-1 4.2 2.4 [6] 22

4.3 4.1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 23

4.3 4.2 24

4.3 4.2 10/43 24/43 25

5 tf-idf 5.1 CS 38 CS tf-idf 4 1 26

5.1 27

6 tf-idf tf-idf TSC[8] 90% 28

29

Ruck Thawonmas 4 30

[1] NTCIR Vol.17 No.3 pp.296-300 May 2002 [2] http://www.nlp.cs.ritsumei.ac.jp/qac/ [3] Vol.17 No.3 pp.301-305 May 2002 [4] 1996 [5] Posum version1.50.2 2002 [6] Takayuki TOMOIKE, Tomohiko KAWACHI, Ruck THAWONMAS, Akio SAKAMOTO., Article Retrieval and Answer Extraction Exploiting Characteristics in Newspaper Articles for the QAC Task2, Working Notes of the Third NTCIR Workshop Meeting Part IV: Question Answering Challenge, pp.101-105, Oct. 2002. [7] version 2.2.9 2002 [8] http://lr-www.pi.titech.ac.jp/tsc/ 31

A A.1 1 ID QAC1-2008-01 QAC1-2013-01 QAC1-2018-01 QAC1-2026-01 QAC1-2033-01 QAC1-2041-01 QAC1-2054-01 QAC1-2058-01 QAC1-2060-01 QAC1-2063-01 QAC1-2071-01 QAC1-2074-01 QAC1-2079-01 QAC1-2081-01 QAC1-2085-01 32

A.2 2 ID QAC1-2090-01 QAC1-2096-01 QAC1-2098-01 QAC1-2099-01 QAC1-2103-01 QAC1-2110-01 QAC1-2111-01 QAC1-2115-01 QAC1-2122-01 QAC1-2123-01 QAC1-2128-01 QAC1-2139-01 QAC1-2142-01 QAC1-2146-01 QAC1-2148-01 QAC1-2153-01 QAC1-2156-01 QAC1-2158-01 QAC1-2149-01 33

A.3 3 ID QAC1-2164-01 QAC1-2165-01 QAC1-2172-01 QAC1-2174-01 QAC1-2176-01 QAC1-2178-01 QAC1-2188-01 QAC1-2197-01 QAC1-2198-01 34

B 3.5 B.1 3.5 1 0 3 ID DOCNO DOCNO QAC1-2008-01 991005028 980525121 QAC1-2013-01 991210285 980225160 QAC1-2018-01 980918107 991213010 QAC1-2026-01 980119202 980129039 QAC1-2033-01 990811078 980317039 QAC1-2041-01 990205098 980322226 QAC1-2054-01 980724195 990125013 QAC1-2058-01 991011152 991101062 QAC1-2060-01 990306155 991129179 QAC1-2063-01 980701331 990415289 QAC1-2071-01 990124138 990112001 QAC1-2074-01 980925100 980925101 QAC1-2079-01 991013267 991013267 QAC1-2081-01 990811078 990824018 QAC1-2085-01 980918107 980825060 QAC1-2090-01 990811078 990816036 35

B.2 3.5 2 0 3 ID DOCNO DOCNO QAC1-2096-01 980702150 980706006 QAC1-2098-01 991210285 990621238 QAC1-2099-01 990619178 980802150 QAC1-2103-01 991210286 990412212 QAC1-2110-01 980717097 991007357 QAC1-2111-01 980318276 980912299 QAC1-2115-01 980217049 980706216 QAC1-2122-01 991026082 980116255 QAC1-2123-01 991230072 990719318 QAC1-2128-01 991230072 990808100 QAC1-2139-01 980703344 991026178 QAC1-2142-01 980315167 990908188 QAC1-2146-01 980310263 980310263 QAC1-2148-01 980820141 990220126 QAC1-2149-01 980912030 990817125 QAC1-2153-01 980105123 980606330 QAC1-2156-01 991210285 980907263 QAC1-2158-01 991103116 980614230 QAC1-2164-01 990820208 980106236 36

B.3 3.5 3 0 3 ID DOCNO DOCNO QAC1-2165-01 981001230 981001230 QAC1-2172-01 990124138 980605357 QAC1-2174-01 991210285 981101128 QAC1-2176-01 980630357 980630395 QAC1-2178-01 981116226 980603379 QAC1-2188-01 980415119 990107147 QAC1-2197-01 990401259 990401259 QAC1-2198-01 980722215 991202086 37

C 4.3 C.1 4.3 1 ID DOCNO DOCNO QAC1-2008-01 980525121 980525121 QAC1-2013-01 980225160 980325075 QAC1-2018-01 991213010 991213010 QAC1-2026-01 980129039 990819015 QAC1-2033-01 980317039 980317039 QAC1-2041-01 980322226 990111256 QAC1-2054-01 990125013 991029008 QAC1-2058-01 991101062 990220177 QAC1-2060-01 991129179 980105214 QAC1-2063-01 990415289 980926283 QAC1-2071-01 990112001 990112001 QAC1-2074-01 980925101 981223079 QAC1-2079-01 991013267 990706037 QAC1-2081-01 990824018 980928015 QAC1-2085-01 980825060 990205181 QAC1-2090-01 990816036 991113171 38

C.2 4.3 2 ID DOCNO DOCNO QAC1-2096-01 980706006 980706006 QAC1-2098-01 990621238 990202113 QAC1-2099-01 980802150 980802150 QAC1-2103-01 990412212 990312159 QAC1-2110-01 991007357 980101246 QAC1-2111-01 980912299 980912299 QAC1-2115-01 980706216 991217099 QAC1-2122-01 980116255 991001034 QAC1-2123-01 990719318 990219119 QAC1-2128-01 990808100 990704102 QAC1-2139-01 991026178 991127201 QAC1-2142-01 990908188 990627076 QAC1-2146-01 980310263 980310263 QAC1-2148-01 990220126 990216276 QAC1-2149-01 990817125 990430105 QAC1-2153-01 980606330 980606330 QAC1-2156-01 980907263 980419068 QAC1-2158-01 980614230 990819359 QAC1-2164-01 980106236 981202178 39

C.3 4.3 3 ID DOCNO DOCNO QAC1-2165-01 981001230 990107333 QAC1-2172-01 980605357 991231139 QAC1-2174-01 981101128 980907303 QAC1-2176-01 980630395 980225211 QAC1-2178-01 980603379 981102161 QAC1-2188-01 990107147 990107147 QAC1-2197-01 990401259 980919193 QAC1-2198-01 991202086 990226107 40