<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML



Similar documents
2 : Open Clip Art Library [4] Microsoft Office PowerPoint Web PowerPoint 2 Yahoo! Web [5] SlideShare Yahoo! Web Yahoo! Web

大学における原価計算教育の現状と課題

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

1 1 tf-idf tf-idf i

SERPWatcher SERPWatcher SERP Watcher SERP Watcher,

DEIM Forum 2009 E

Web Web Web Web Web, i

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

( )

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

2015 9

29 jjencode JavaScript

untitled

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

IPSJ SIG Technical Report Vol.2011-DBS-153 No /11/3 Wikipedia Wikipedia Wikipedia Extracting Difference Information from Multilingual Wiki

2. Twitter Twitter 2.1 Twitter Twitter( ) Twitter Twitter ( 1 ) RT ReTweet RT ReTweet RT ( 2 ) URL Twitter Twitter 140 URL URL URL 140 URL URL

DEIM Forum 2010 A Web Abstract Classification Method for Revie

IPSJ SIG Technical Report Vol.2009-DBS-149 No /11/ Bow-tie SCC Inter Keyword Navigation based on Degree-constrained Co-Occurrence Graph

DEIM Forum 2010 A3-3 Web Web Web Web Web. Web Abstract Web-page R

2009/9 Vol. J92 D No. 9 HTML [3] Microsoft PowerPoint Apple Keynote OpenOffice Impress XML 4 1 (A) (C) (F) Fig. 1 1 An example of slide i

Microsoft Word - toyoshima-deim2011.doc

Wikipedia 2 Wikipedia Web Wikipedia 2. Web [6] [11] [8] 2 SVM Bollegala [1] 5-gram URL URL 2-gram [6] [11] SVM 3 SVM [8] Bollegala [1] SVM [7] [9] [6]

BOK body of knowledge, BOK BOK BOK 1 CC2001 computing curricula 2001 [1] BOK IT BOK 2008 ITBOK [2] social infomatics SI BOK BOK BOK WikiBOK BO

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

[1], B0TB2053, i

IT,, i

kut-paper-template.dvi

Vol. 9 No. 5 Oct (?,?) A B C D 132

220 28;29) 30 35) 26;27) % 8.0% 9 36) 8) 14) 37) O O 13 2 E S % % 2 6 1fl 2fl 3fl 3 4

Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

_314I01BM浅谷2.indd

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

1 2. Nippon Cataloging Rules NCR [6] (1) 5 (2) 4 3 (3) 4 (4) 3 (5) ISSN 7 International Standard Serial Number ISSN (6) (7) 7 16 (8) ISBN ISSN I

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

3_23.dvi

A B C B C ICT ICT ITC ICT

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

IPSJ SIG Technical Report Vol.2009-HCI-134 No /7/17 1. RDB Wiki Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visibl

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

untitled

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

untitled

IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

Kyushu Communication Studies 第2号

Sobel Canny i

dews2004-final.dvi

fiš„v5.dvi

kut-paper-template2.dvi

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

( : A8TB2163)

untitled

,,,,., C Java,,.,,.,., ,,.,, i

kut-paper-template.dvi

Microsoft Word - deim2011_new-ichinose doc

untitled

Fig. 3 3 Types considered when detecting pattern violations 9)12) 8)9) 2 5 methodx close C Java C Java 3 Java 1 JDT Core 7) ) S P S

gengo.dvi


24 Region-Based Image Retrieval using Fuzzy Clustering

和文タイトル

22 Google Trends Estimation of Stock Dealing Timing using Google Trends

企業の信頼性を通じたブランド構築に関する考察

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

johnny-paper2nd.dvi

先端社会研究所紀要 第9号☆/2.島村

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

DEIM Forum 2010 D Development of a La

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

WebRTC P2P Web Proxy P2P Web Proxy WebRTC WebRTC Web, HTTP, WebRTC, P2P i

09_加藤_紀要_2007

. Yahoo! 1!goo 2 QA..... QA Web Web [1]Web Web Yin [2] Web Web Web. [3] Web Wikipedia 1 2

36

IT i

Present Situation and Problems on Aseismic Design of Pile Foundation By H. Hokugo, F. Ohsugi, A. Omika, S. Nomura, Y. Fukuda Concrete Journal, Vol. 29

16_.....E...._.I.v2006

2 10 The Bulletin of Meiji University of Integrative Medicine 1,2 II 1 Web PubMed elbow pain baseball elbow little leaguer s elbow acupun

2reN-A14.dvi

[12] Qui [6][7] Google N-gram[11] Web ( 4travel 5, 6 ) ( 7 ) ( All About 8 ) (1) (2) (3) 3 3 (1) (2) (3) (a) ( (b) (c) (d) (e) (1

1 AND TFIDF Web DFIWF Wikipedia Web Web AND 5. Wikipedia AND 6. Wikipedia Web Ma [4] Ma URL AND Tian [8] Tian Tian Web Cimiano [3] [

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro


先端社会研究 ★5★号/4.山崎

授受補助動詞の使用制限に与える敬語化の影響について : 「くださる」「いただく」を用いた感謝表現を中心に

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N


,,, Twitter,,, ( ), 2. [1],,, ( ),,.,, Sungho Jeon [2], Twitter 4 URL, SVM,, , , URL F., SVM,, 4 SVM, F,.,,,,, [3], 1 [2] Step Entered

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

Transcription:

DEWS2008 C6-4 XML 606-8501 E-mail: yyonei@db.soc.i.kyoto-u.ac.jp, {iwaihara,yoshikawa}@i.kyoto-u.ac.jp XML XML XML, Abstract Person Retrieval on XML Documents by Coreference that Uses Structural Features Yumi YONEI, Mizuho IWAIHARA, and Masatoshi YOSHIKAWA Department of Social Informatics, Graduate School of Informatics, Kyoto University Yoshidahonmachi, Sakyo-ku, Kyoto, 606-8501 Japan E-mail: yyonei@db.soc.i.kyoto-u.ac.jp, {iwaihara,yoshikawa}@i.kyoto-u.ac.jp Present retrieval by keywords is based on the occurrence frequency and the occurrence position of the keywords. As for retrieval by two or more keywords, semantic relation between keywords is important. For retrieving information about a person, it is common to search by pair of keywords consisting of the person s name and his/her attribute. However, if semantic relation between keywords is not considered, the documents that describe different person s attribute may be retrieved. By using dependency analysis and coreference analysis, it is possible to retrieve the contents in which query keywords have semantic dependencies and improve search precision. However, it is costly. On the other hand, as for structural documents such as the XML, correspondence is often influenced by the document structure. In this paper, we confirm it by the coreference that uses structural features of XML documents, and we describe our person retrieval that uses the structual coreference. Key words XML, coreference, person retrieval, structural features, 1. XXX XXX

<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML [1] [17] [3] [18] Wikipedia Web HTML XML Web Web XML 2 3 4 Wikipedia XML 5 2. Web [14] [8] [14] [8]. Web caption [1] [17] [1] - [17] [3] [18] [3] 3 [18] Web 3 Web Web Web [16] [16] Web Web HTML XML

2 (a) name body 3. XML 3. 1 XML 3.1.1 3.1.2 3. 1. 1 1 (linguistic features) [4] 4 ( ) ( ) ( ) ( ) PER- SON, LOCATION Cabocha [7] ( ) 2 (structural features) XML HTML XML 照 応 詞 title p title item title title item (b) 照 応 詞 title p 2 (a) (b) name body name body 照 応 詞 照 応 詞 item item (c) name body 照 応 詞 item item (d) 照 応 詞 name body // 3 k- (k=2) XML 2 2(a) XML title XML 2(b) XML

k- k 3 k=2 k- 3(a),(b),(c) name body 3(d) 2 k- (k=2) k- k- ( (name )(body )) 3(d) k- 3. 1. 2 Support Vector Machine 1 x, y C(x, y) p(x, y) [6] [9] (x, y) 1,0 x ( (name )(body )) k- y { 1 ifx y f i (x, y) = 0 otherwise 1 (1) p(x, y) = 1 z(x) e λ if i (x,y) i z(x) = e λ if i (x,y) i y λ z(x). P (f i) = p(x, y)f i(x, y) (3) x,y (1) (2) P (f i ) = p(x)p(x, y)f i (x, y) (4) x,y P (f i) = P (f i) (5) P (f i ) P (f i ) (6) H(P ) = p(x, y)logp(x, y) (6) x,y λ 2 SVM SVM [2] 2 SVM [5] T 1, T 2 V 1, V 2 E 1, E 2 T 1 = (V 1, E 1), T 2 = (V 2, E 2) K(T 1, T 2 )= K S (s 1, s 2 ) (7) v 1 V 1 v 2 V 2 s 1 S v1 (T 1 ) s 2 S v2 (T 2 ) K S (s 1, s 2 ) = I(s 1 = s 2 ) (8) S v(t ) v V K S (2) I() 1 0 s 1 = s 2 2 [12] RNA HTML XML Web SVM

SVM 1 0 k- 3.1.1(2) SVM 3. 2 ( (name )(body )) body name body XML 3. 3 4. XML 4. 1 4. 1. 1 Wikipedia 1 XML 4 1: 2: 3: 1 http://ja.wikipedia.org/ 1 1 2 3 4 333 521 6 72 75 96 4 4 12597 22758 18 33 240 489 7 14 12260 22269 11 19 1 (item ) 2 (p ) 3 (item (normalist )) 4 ( (name )(body )) 5 ( (title )( )) 6 (body (p )( )) 7 (body ( )( )) 8 (normalist (item )(item )) 9 ( (normalist ) (normalist )) 10 ((p ) (p )) 11 ( ( )( )) 4 k- (k=2) 4: 1 2 3 4 1 1 4. 1. 2 [4] [11] Cabocha [7] EDR [13] EDR

2 k- (k=2) (1) (item ) 71 (2) (p ) 40 (3) (item (normalist )) 4 (4) ( (name )(body )) 93 (5) ( (title )( )) 3 (6) (body (p )( )) 29 (7) (body ( )( )) 0 (8) (normalist (item )(item )) 0 (9) ( (normalist )(normalist )) 0 (10) ((p ) (p )) 0 (11) ( ( )( )) 0 240 3 k- (k=2) (2) (p ) 7 (10) ((p ) (p )) 0 7 Wikipedia 2 3 k=2 k- 4 2 3 4 (1) (2) item (3) item (4) name (5) title (6) (6) (10) 2 3 XML (3) (4) (5) name title 2 (4) name item (body (p )( ) 2 3 4. 1. 3 SVM [10] SVM SV M light [15] SVM 0 1 4. 2 (precision) (recall) precision = recall = F (F-measure) F F -measure = 4. 3 2 precision recall precision + recall k- k- SVM 4. 3. 1 XML (I) (II) (III) 4, 3.1.1(1) k=2 k- 5 5 (I) (II)

4 (I) (II) (III) 5 1 2 3 4 74.3 % 76.0% 51.3 % 31.2% (I) 40.8 % 66.8% 57.5 % 48.7% F 52.7 % 71.1% 54.2 % 38.0% (II) 77.0% 78.9% 69.3% 75.0% 48.1% 69.2% 91.7% 54.8% F 59.2% 73.7% 74.9% 63.3% (III) 90.6% 92.0% 82.0% 86.0% 6 k- 1 2 3 4 77.0% 78.9% 69.3% 75.0% k=2 48.1% 69.2% 91.7% 54.8% F 59.2% 73.7% 74.9% 63.3% 75.4% 77.6% 69.3% 75.0% k=3 46.2% 68.6% 91.7% 54.8% F 57.3% 72.8% 74.9% 63.3% 72.1% 78.0% 69.3% 75.0% k= 49.3% 72.0% 91.7% 54.8% F 58.6% 74.9% 74.9% 63.3% 7 1 2 3 4 72.1% 78.0% 69.3% 75.0% 49.3% 72.0% 91.7% 54.8% F 58.6% 74.9% 74.9% 63.3% 97.4% 85.3% 63.3% 88.0% SVM 30.6% 61.2% 83.3% 30.4% F 46.6% 71.3% 719% 45.2% 38.9% 33.5% 62.0% 54.8% F 54.4% 49.1% 70.6% 66.7% (k= ) F XML (III) (II) (III) 6 3 4 3 1 2 k=2 k=3 k- k=2 ( = ) 3 F 1 k=2 2 k= k=2 k- 4. 3. 3. [4] SVM 4. 3. 2 k- 7 2 3-2 (k=2 - ) 3 (k=3 - ) SVM F 3 SVM

5. XML k- XML Wikipedia XML 4 F k- k- =2 k- SVM F Wikipedia HTML (B)( 18300031), [4],,,,, Vol 46, No. 3 2005. [5] Vol.21, No.1,a, 2006. [6] Andrew Kehler, Probabilistic Coreference in Information Extraction,CoRR, cmp-lg/9706012,1997. [7], Support Vector Machine Chunk,, Vol. 9, No. 5, pp.3-21 2002. [8] 11, 2005 [9] Adam L. Berger, Stephen A. Della Pietra, Vincent J. Della Pietra A Maximum Entropy Approach to Natural Language Processing, Computational Linguistics, 22 1996. [10] Zhang Le Maximum Entropy Modeling Toolkid for Python and C++, http://homepages.inf.ed.ac.uk/s0450736/maxent toolkit. html. [11],,,,, version 2.3.3, 2003. [12] Alessandro Moschitti, Making Tree Kernels proctical for Natural Language Learning,EACL, 2006. [13] EDR. Technical Report TR 045, 1995. [14] 2002 pp175-176 2002. [15] SV M light http://dit.unitn.it/ moschitt/tree-kernel.htm. [16] Lan Yi,Bing Liu,Xiaoli Li, Eliminating noisy information in web pages for data mining, Conference on Knowledge Discovery in Data Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.296-305, 2003. [17] Minoru Yoshida, Kentaro Torisawa, Junichi Tsujii, Extracting ontologies from World Wide Web via HTML tables, Pacific Association for Computational Linguistics, pp.332-341,2001. [18] Web DEWS 6-p-05 2003. [1] Hsin-Hsi Chen, Shih-Chung Tsai, Jin-He Tsai Mining Tables from Large Scale HTML Texts, 18th International Conference Computational Linguistics, pp.166-172 2000. [2] Nello Cristianini, John Shawe-Taylor,, [3] WWW HTML, DE2005-136 2005