<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML



Similar documents
2 : Open Clip Art Library [4] Microsoft Office PowerPoint Web PowerPoint 2 Yahoo! Web [5] SlideShare Yahoo! Web Yahoo! Web

大学における原価計算教育の現状と課題

1 1 tf-idf tf-idf i

Web Web Web Web Web, i

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

( )

29 jjencode JavaScript

untitled

DEIM Forum 2010 A3-3 Web Web Web Web Web. Web Abstract Web-page R

BOK body of knowledge, BOK BOK BOK 1 CC2001 computing curricula 2001 [1] BOK IT BOK 2008 ITBOK [2] social infomatics SI BOK BOK BOK WikiBOK BO

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

kut-paper-template.dvi

220 28;29) 30 35) 26;27) % 8.0% 9 36) 8) 14) 37) O O 13 2 E S % % 2 6 1fl 2fl 3fl 3 4

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

1 2. Nippon Cataloging Rules NCR [6] (1) 5 (2) 4 3 (3) 4 (4) 3 (5) ISSN 7 International Standard Serial Number ISSN (6) (7) 7 16 (8) ISBN ISSN I

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

3_23.dvi

A B C B C ICT ICT ITC ICT

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

untitled

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

Kyushu Communication Studies 第2号

dews2004-final.dvi

fiš„v5.dvi

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

untitled

,,,,., C Java,,.,,.,., ,,.,, i

kut-paper-template.dvi

Microsoft Word - deim2011_new-ichinose doc

untitled


企業の信頼性を通じたブランド構築に関する考察

先端社会研究所紀要 第9号☆/2.島村

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

DEIM Forum 2010 D Development of a La

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

WebRTC P2P Web Proxy P2P Web Proxy WebRTC WebRTC Web, HTTP, WebRTC, P2P i

. Yahoo! 1!goo 2 QA..... QA Web Web [1]Web Web Yin [2] Web Web Web. [3] Web Wikipedia 1 2

36

Present Situation and Problems on Aseismic Design of Pile Foundation By H. Hokugo, F. Ohsugi, A. Omika, S. Nomura, Y. Fukuda Concrete Journal, Vol. 29

16_.....E...._.I.v2006

2 10 The Bulletin of Meiji University of Integrative Medicine 1,2 II 1 Web PubMed elbow pain baseball elbow little leaguer s elbow acupun

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro


先端社会研究 ★5★号/4.山崎

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N


IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

Transcription:

DEWS2008 C6-4 XML 606-8501 E-mail: yyonei@db.soc.i.kyoto-u.ac.jp, {iwaihara,yoshikawa}@i.kyoto-u.ac.jp XML XML XML, Abstract Person Retrieval on XML Documents by Coreference that Uses Structural Features Yumi YONEI, Mizuho IWAIHARA, and Masatoshi YOSHIKAWA Department of Social Informatics, Graduate School of Informatics, Kyoto University Yoshidahonmachi, Sakyo-ku, Kyoto, 606-8501 Japan E-mail: yyonei@db.soc.i.kyoto-u.ac.jp, {iwaihara,yoshikawa}@i.kyoto-u.ac.jp Present retrieval by keywords is based on the occurrence frequency and the occurrence position of the keywords. As for retrieval by two or more keywords, semantic relation between keywords is important. For retrieving information about a person, it is common to search by pair of keywords consisting of the person s name and his/her attribute. However, if semantic relation between keywords is not considered, the documents that describe different person s attribute may be retrieved. By using dependency analysis and coreference analysis, it is possible to retrieve the contents in which query keywords have semantic dependencies and improve search precision. However, it is costly. On the other hand, as for structural documents such as the XML, correspondence is often influenced by the document structure. In this paper, we confirm it by the coreference that uses structural features of XML documents, and we describe our person retrieval that uses the structual coreference. Key words XML, coreference, person retrieval, structural features, 1. XXX XXX

<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML [1] [17] [3] [18] Wikipedia Web HTML XML Web Web XML 2 3 4 Wikipedia XML 5 2. Web [14] [8] [14] [8]. Web caption [1] [17] [1] - [17] [3] [18] [3] 3 [18] Web 3 Web Web Web [16] [16] Web Web HTML XML

2 (a) name body 3. XML 3. 1 XML 3.1.1 3.1.2 3. 1. 1 1 (linguistic features) [4] 4 ( ) ( ) ( ) ( ) PER- SON, LOCATION Cabocha [7] ( ) 2 (structural features) XML HTML XML 照 応 詞 title p title item title title item (b) 照 応 詞 title p 2 (a) (b) name body name body 照 応 詞 照 応 詞 item item (c) name body 照 応 詞 item item (d) 照 応 詞 name body // 3 k- (k=2) XML 2 2(a) XML title XML 2(b) XML

k- k 3 k=2 k- 3(a),(b),(c) name body 3(d) 2 k- (k=2) k- k- ( (name )(body )) 3(d) k- 3. 1. 2 Support Vector Machine 1 x, y C(x, y) p(x, y) [6] [9] (x, y) 1,0 x ( (name )(body )) k- y { 1 ifx y f i (x, y) = 0 otherwise 1 (1) p(x, y) = 1 z(x) e λ if i (x,y) i z(x) = e λ if i (x,y) i y λ z(x). P (f i) = p(x, y)f i(x, y) (3) x,y (1) (2) P (f i ) = p(x)p(x, y)f i (x, y) (4) x,y P (f i) = P (f i) (5) P (f i ) P (f i ) (6) H(P ) = p(x, y)logp(x, y) (6) x,y λ 2 SVM SVM [2] 2 SVM [5] T 1, T 2 V 1, V 2 E 1, E 2 T 1 = (V 1, E 1), T 2 = (V 2, E 2) K(T 1, T 2 )= K S (s 1, s 2 ) (7) v 1 V 1 v 2 V 2 s 1 S v1 (T 1 ) s 2 S v2 (T 2 ) K S (s 1, s 2 ) = I(s 1 = s 2 ) (8) S v(t ) v V K S (2) I() 1 0 s 1 = s 2 2 [12] RNA HTML XML Web SVM

SVM 1 0 k- 3.1.1(2) SVM 3. 2 ( (name )(body )) body name body XML 3. 3 4. XML 4. 1 4. 1. 1 Wikipedia 1 XML 4 1: 2: 3: 1 http://ja.wikipedia.org/ 1 1 2 3 4 333 521 6 72 75 96 4 4 12597 22758 18 33 240 489 7 14 12260 22269 11 19 1 (item ) 2 (p ) 3 (item (normalist )) 4 ( (name )(body )) 5 ( (title )( )) 6 (body (p )( )) 7 (body ( )( )) 8 (normalist (item )(item )) 9 ( (normalist ) (normalist )) 10 ((p ) (p )) 11 ( ( )( )) 4 k- (k=2) 4: 1 2 3 4 1 1 4. 1. 2 [4] [11] Cabocha [7] EDR [13] EDR

2 k- (k=2) (1) (item ) 71 (2) (p ) 40 (3) (item (normalist )) 4 (4) ( (name )(body )) 93 (5) ( (title )( )) 3 (6) (body (p )( )) 29 (7) (body ( )( )) 0 (8) (normalist (item )(item )) 0 (9) ( (normalist )(normalist )) 0 (10) ((p ) (p )) 0 (11) ( ( )( )) 0 240 3 k- (k=2) (2) (p ) 7 (10) ((p ) (p )) 0 7 Wikipedia 2 3 k=2 k- 4 2 3 4 (1) (2) item (3) item (4) name (5) title (6) (6) (10) 2 3 XML (3) (4) (5) name title 2 (4) name item (body (p )( ) 2 3 4. 1. 3 SVM [10] SVM SV M light [15] SVM 0 1 4. 2 (precision) (recall) precision = recall = F (F-measure) F F -measure = 4. 3 2 precision recall precision + recall k- k- SVM 4. 3. 1 XML (I) (II) (III) 4, 3.1.1(1) k=2 k- 5 5 (I) (II)

4 (I) (II) (III) 5 1 2 3 4 74.3 % 76.0% 51.3 % 31.2% (I) 40.8 % 66.8% 57.5 % 48.7% F 52.7 % 71.1% 54.2 % 38.0% (II) 77.0% 78.9% 69.3% 75.0% 48.1% 69.2% 91.7% 54.8% F 59.2% 73.7% 74.9% 63.3% (III) 90.6% 92.0% 82.0% 86.0% 6 k- 1 2 3 4 77.0% 78.9% 69.3% 75.0% k=2 48.1% 69.2% 91.7% 54.8% F 59.2% 73.7% 74.9% 63.3% 75.4% 77.6% 69.3% 75.0% k=3 46.2% 68.6% 91.7% 54.8% F 57.3% 72.8% 74.9% 63.3% 72.1% 78.0% 69.3% 75.0% k= 49.3% 72.0% 91.7% 54.8% F 58.6% 74.9% 74.9% 63.3% 7 1 2 3 4 72.1% 78.0% 69.3% 75.0% 49.3% 72.0% 91.7% 54.8% F 58.6% 74.9% 74.9% 63.3% 97.4% 85.3% 63.3% 88.0% SVM 30.6% 61.2% 83.3% 30.4% F 46.6% 71.3% 719% 45.2% 38.9% 33.5% 62.0% 54.8% F 54.4% 49.1% 70.6% 66.7% (k= ) F XML (III) (II) (III) 6 3 4 3 1 2 k=2 k=3 k- k=2 ( = ) 3 F 1 k=2 2 k= k=2 k- 4. 3. 3. [4] SVM 4. 3. 2 k- 7 2 3-2 (k=2 - ) 3 (k=3 - ) SVM F 3 SVM

5. XML k- XML Wikipedia XML 4 F k- k- =2 k- SVM F Wikipedia HTML (B)( 18300031), [4],,,,, Vol 46, No. 3 2005. [5] Vol.21, No.1,a, 2006. [6] Andrew Kehler, Probabilistic Coreference in Information Extraction,CoRR, cmp-lg/9706012,1997. [7], Support Vector Machine Chunk,, Vol. 9, No. 5, pp.3-21 2002. [8] 11, 2005 [9] Adam L. Berger, Stephen A. Della Pietra, Vincent J. Della Pietra A Maximum Entropy Approach to Natural Language Processing, Computational Linguistics, 22 1996. [10] Zhang Le Maximum Entropy Modeling Toolkid for Python and C++, http://homepages.inf.ed.ac.uk/s0450736/maxent toolkit. html. [11],,,,, version 2.3.3, 2003. [12] Alessandro Moschitti, Making Tree Kernels proctical for Natural Language Learning,EACL, 2006. [13] EDR. Technical Report TR 045, 1995. [14] 2002 pp175-176 2002. [15] SV M light http://dit.unitn.it/ moschitt/tree-kernel.htm. [16] Lan Yi,Bing Liu,Xiaoli Li, Eliminating noisy information in web pages for data mining, Conference on Knowledge Discovery in Data Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.296-305, 2003. [17] Minoru Yoshida, Kentaro Torisawa, Junichi Tsujii, Extracting ontologies from World Wide Web via HTML tables, Pacific Association for Computational Linguistics, pp.332-341,2001. [18] Web DEWS 6-p-05 2003. [1] Hsin-Hsi Chen, Shih-Chung Tsai, Jin-He Tsai Mining Tables from Large Scale HTML Texts, 18th International Conference Computational Linguistics, pp.166-172 2000. [2] Nello Cristianini, John Shawe-Taylor,, [3] WWW HTML, DE2005-136 2005