2918 Oct. 2006 5 2. 2.1 Web WWW WWW Web Web Web HTML Web RDF 1) Web WWW Web Web content Web usage Web structure 3 11) Web Web Web Web 1 5) Web Web Web



Similar documents
IPSJ SIG Technical Report Vol.2011-DBS-153 No /11/3 Wikipedia Wikipedia Wikipedia Extracting Difference Information from Multilingual Wiki

DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

Web Web Web Web Web, i

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

IPSJ SIG Technical Report Vol.2009-HCI-134 No /7/17 1. RDB Wiki Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visibl

SERPWatcher SERPWatcher SERP Watcher SERP Watcher,

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

Microsoft Word - toyoshima-deim2011.doc

Fig. 3 3 Types considered when detecting pattern violations 9)12) 8)9) 2 5 methodx close C Java C Java 3 Java 1 JDT Core 7) ) S P S

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

paper.dvi

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

IT,, i

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

1_26.dvi

01ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐02ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐03ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐04ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐05ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐06ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六

21 A contents organization method for information sharing systems

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

2. Twitter Twitter 2.1 Twitter Twitter( ) Twitter Twitter ( 1 ) RT ReTweet RT ReTweet RT ( 2 ) URL Twitter Twitter 140 URL URL URL 140 URL URL

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

3_39.dvi

( )

yama_interact2007.dvi

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

fiš„v5.dvi

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

IPSJ SIG Technical Report Vol.2014-DBS-159 No.6 Vol.2014-IFAT-115 No /8/1 1,a) 1 1 1,, 1. ([1]) ([2], [3]) A B 1 ([4]) 1 Graduate School of Info

<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML

Vol. 9 No. 5 Oct (?,?) A B C D 132

‰gficŒõ/’ÓŠ¹

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

DEIM Forum 2012 E Web Extracting Modification of Objec

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

Wikipedia YahooQA MAD 4)5) MAD Web 6) 3. YAMAHA 7) 8) Vocaloid PV YouTube 1 minato minato ussy 3D MAD F EDis ussy

e-learning station 1) 2) 1) 3) 2) 2) 1) 4) e-learning Station 16 e-learning e-learning key words: e-learning LMS CMS A Trial and Prospect of Kumamoto

2015 9

untitled

IPSJ SIG Technical Report An Evaluation Method for the Degree of Strain of an Action Scene Mao Kuroda, 1 Takeshi Takai 1 and Takashi Matsuyama 1

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

Vol. 48 No. 3 Mar PM PM PMBOK PM PM PM PM PM A Proposal and Its Demonstration of Developing System for Project Managers through University-Indus

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

大学における原価計算教育の現状と課題

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

DEIM Forum 2010 A Web Abstract Classification Method for Revie

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

3D UbiCode (Ubiquitous+Code) RFID ResBe (Remote entertainment space Behavior evaluation) 2 UbiCode Fig. 2 UbiCode 2. UbiCode 2. 1 UbiCode UbiCode 2. 2

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

Vol. 23 No. 4 Oct Kitchen of the Future 1 Kitchen of the Future 1 1 Kitchen of the Future LCD [7], [8] (Kitchen of the Future ) WWW [7], [3

(a) (b) 1 JavaScript Web Web Web CGI Web Web JavaScript Web mixi facebook SNS Web URL ID Web 1 JavaScript Web 1(a) 1(b) JavaScript & Web Web Web Webji

2 : Open Clip Art Library [4] Microsoft Office PowerPoint Web PowerPoint 2 Yahoo! Web [5] SlideShare Yahoo! Web Yahoo! Web

Title 中國宗教文獻研究國際シンポジウム報告書 ( 大規模佛教文獻群に對する確率統計的分析の試み / 師茂樹 ) Author(s) Citation (2004) Issue Date URL Right Typ

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

IT i

1034 IME Web API Web API 1 IME Fig. 1 Suitable situations for context-aware IME. IME IME IME IME 1 GPS Web API Web API Web API Web )

日本感性工学会論文誌

07九州工業大学.indd

29 jjencode JavaScript

DEIM Forum 2010 A3-3 Web Web Web Web Web. Web Abstract Web-page R

IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions

fiš„v8.dvi

, IT.,.,..,.. i

1 Web Web 1,,,, Web, Web : - i -

自然言語処理16_2_45

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

Vol.53 No (Aug. 2012) BibTEX 1 2,a) , bole BibTEX BibTEX 1 BibTeX Literature Management System Based on BibTEX for L

Fig. 1. Example of characters superimposed on delivery slip.

2reN-A14.dvi

36

1: ( 1) 3 : 1 2 4

MAC root Linux 1 OS Linux 2.6 Linux Security Modules LSM [1] Security-Enhanced Linux SELinux [2] AppArmor[3] OS OS OS LSM LSM Performance Monitor LSMP

DEIM Forum 2009 E

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect

24312.dvi

ipod touch 1 2 Apple ipod touch ipod touch 3 ( ) ipod touch ( 1 ) Apple ( 2 ) Web 1),2) 3. ipod touch 1 2 ipod touch x y z i

IPSJ SIG Technical Report Vol.2009-DBS-149 No /11/ Bow-tie SCC Inter Keyword Navigation based on Degree-constrained Co-Occurrence Graph

& Vol.2 No (Mar. 2012) 1,a) , Bluetooth A Health Management Service by Cell Phones and Its Us

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

ActionScript Flash Player 8 ActionScript3.0 ActionScript Flash Video ActionScript.swf swf FlashPlayer AVM(Actionscript Virtual Machine) Windows

Vol. 28 No. 2 Apr Web Twitter/Facebook UI Twitter Web Twitter/Facebook e.g., Web Web UI 1 2 SNS 1, 2 2

和文タイトル

DEIM Forum 2009 B4-6, Str

Web Hashtag Hashtag Twitter Hashtag Twitter Hashtag Hashtag Hashtag Twitter Hashtag Twitter Hashtag contexthashtag contexthashtag Hashtag contexthasht

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

1 UD Fig. 1 Concept of UD tourist information system. 1 ()KDDI UD 7) ) UD c 2010 Information Processing S

7,, i

kut-paper-template2.dvi

finalrep.dvi

Transcription:

Vol. 47 No. 10 Oct. 2006 Wikipedia 2 Wiki Wikipedia Web Wikipedia Mining to Construct a Thesaurus Kotaro Nakayama, Takahiro Hara and Shojiro Nishio Thesauri have been widely used in many applications such as information retrieval, natural language processing (NLP), and interactive agents. However, several problems, such as morphological analysis, treatment of synonymous and multisense words, still remain and degrade accuracy on traditional NLP-based thesaurus construction methods. In addition, adding latest/miner words is also a difficult issue on this research area. In this paper, to solve these problems, we propose a web mining method to automatically construct a thesaurus by extracting relations between words from Wikipedia, a wiki-based huge encyclopedia on WWW. 1. WWW Weblog 13) Wiki 15) Web CMS Contents Management System Web WWW Wikiepdia Wikipedia Wiki Wikipedia Web Wikipedia 1 Wikipedia 2005 10 75 Department of Multimedia Engineering, Graduate School of Information Science and Technology, Osaka University 10 2 Wikipedia Web Wikipedia Wiki 2 Web 3 4 3 2917

2918 Oct. 2006 5 2. 2.1 Web WWW WWW Web Web Web HTML Web RDF 1) Web WWW Web Web content Web usage Web structure 3 11) Web Web Web Web 1 5) Web Web Web Web Web Web Web Web Web Web Google(TM) PageRank(TM) 14) HITS 12) Web Dean 9) 2.2 17) IR WordNet 16) EDR 18) 2.2.1 17) 3),7) Brill Tagger 2)

Vol. 47 No. 10 Wikipedia 2919 2.2.2 Web Web 6) Davison 8) Web A Apple <a href="http://en.wikipedia.com/wiki/apple_computer"> Apple </a> Web Web Web Chen 4) Web Web Chen Web Chen 2 1 2 Web Wikipedia Web Web Wikipedia 3. Wikipedia Wikipedia Web 3.1 Web Wikipedia Wikpedia Wiki Wikipedia Web 3.1.1 Wikipedia 1 1 Wikipedia 1 Wikipedia Fig. 1 Wikipedia.

2920 Oct. 2006 Wikipedia Wiki 3.1.2 Wikipedia 65 1,000 Wikipedia Wikipedia 715 3.1.3 Wikipedia 3.1.4 WWW Wikipedia WWW 3.2 Wikipedia Wikipedia Wikipedia Web Wikipedia Wikipedia Web Web 3.2.1 Wikipedia Web P = {p 1,p 2,p 3,..., p n } p i 1 i n Forward Link Backward Link 2 p i Forward Link p i F pi = {f i1,f i2,f i3,..., f im } Backward Link p i B pi = {b i1,b i2,b i3,..., b il } Wikipedia p i Forward Link Backward Link p j p i p j 1 2 3 Forward Link Backward Link n

Vol. 47 No. 10 Wikipedia 2921 Redirect Link Redirect Link Action film Action movie p i Redirect Link R pi = {r i1,r i2,r i3,..., r ik } 2 Forward Link Backward Link Redirect Link p i p j Redirect Link p j Backward Link b i1 b i2 b i3 b j1 b j2 b j3 6 3.2.2 p i RE Algorithm RE(p i,weight,depth) 1 if depth > n then return; 2 F pi = GetF orwardlinks(p i ); 3 for each (p j ) F pi do 4 score = weight/ F pi ; 5 S pj = S pj + score; 6 RE(p j,score,depth+1); 7 B pi = GetBackwardLinks(p i ); 8 for each (p j ) B pi do 9 score = weight/ B pi ; 10 S pj = S pj + score; 11 RE(p j,score,depth+1); 12 R pi = GetRedirectLinks(p i ); 13 for each (p j ) R pi do 14 RE(p j,weight,depth); p i weight 1.0 depth 1 3 1 n 2 6 p i Forward Link 2 Fig. 2 Links. p i Forward Link F pi Backward Link p i Redirect Link S pj p i p j S pj p i 3.3 3.3.1 Apple Computer Apple Apple Computer Apple Backward Link S pi S pi GetSynonym(p i )

2922 Oct. 2006 Algorithm GetSynonym(p i ) 1 B pi = GetBackwardLinks(p i ); 2 for each (p j ) B pi do 3 w = GetLinkT ext(p j ); 4 S w = S w +1; S w 1 w p i GetSynonym(p i ) Apple Computer Backward Link 1 3.3.2 Apple Apple Computer Apple CS w CS(p i,w)= B p i,w B j p j,w B pi,w B pi w UFO CS Unidentified flying object CS 0.65 UFO Unidentified flying object Apple Apple CS 0.44 Apple Computer CS 0.35 Apple 2 CS 1 GetSynonym Table 1 Result of GetSynonym algorithm. w S w Apple 176 Apple Computer 462 Apple Computer Company 1 Apple Computer Corporation 2...... Yahoo! (TM) Yahoo Yahoo Yahoo! CS 0.94 3.4 (1) P = {p 1,p 2,..., p n } (2) p i (3) p i (2) (4) p i n (5) (4) (6) (2) 4. 3 3 47 4.1 2 4.2 Backward Table 2 2 Environment for performance evaluation. CPU Pentium4 3.2 GHz 2GB OS Windows XP C# DB CPU G4 1.42 GHZ 1GB OS Mac OS 10.4 DBMS MySQL 4.1

Vol. 47 No. 10 Wikipedia 2923 3 Fig. 3 Generated words and relations. Link Forward Link Redirect Link RE MySQL B-Tree XML 3 HTML Document Type Definition DTD XML 65 100 Backward Link Forward Link 100 Wikinews Wikipedia Wikipedia 4.3 3 1 n Wikipedia 1 2 3 (1) 1 (2) 30 (3) 5 1 3 5 (4) 10 20 30 is-a is-apart-of

2924 Oct. 2006 CP Concept precision 3) CP = 4 5 3 n 2 17) Wikipedia Chen 4) 1 9,250 Web 762,636 5 52,729,700 1 5 CP Chen Wikipedia 1 Chen d 1 2 3 CP Chen 3 (1) (2) Web (3) Web 30 5 (4) CS (5) (6) Web (7) Web 30 5 (2) Web CS 4.4 1 Forward Link Backward Link 3 3 4 18 30 1 2 2 3 O(A n ) 3 75 Wikipedia Wikipedia 2 3 n 2

Vol. 47 No. 10 Wikipedia 2925 3 Table 3 The influence of distance for performance. 1 2 3 Nintendo 0.05 sec., 328 6.63 sec., 53981 1129.03 sec., 9973687 apple 0.03 sec., 208 3.66 sec., 24217 380.27 sec., 3022035 ipod 0.04 sec., 159 1.71 sec., 11645 205.36 sec., 1647562 Table 5 5 The comparison with other methods. 10 20 30 / 46.2% 35.4% 30.7% 0.34 sec. Chen 1 39.3% 28.1% 22.4% 1.20 sec. Chen 2 50.0% 50.9% 41.7% 121.34 sec. 1 66.7% 64.2% 61.2% 0.04 sec. 2 93.2% 86.2% 83.1% 4.00 sec. 3 91.4% 89.4% 85.9% 571.55 sec. 4 Table 4 The influence of distance for precision. 10 20 30 1 66.7% 64.2% 61.2% 2 93.2% 86.2% 83.1% 3 91.4% 89.4% 85.9% 2 Chen 5 17 1 30 1 SQL Server 2005 (TM) SQL Server 2005 3 Chen 1 2 3 3 Chen 2 121.34 / 1 80 Wikipedia 2 4.00 / Wikipedia 1 15,000 2005 12 Backward Link 100 200 2 1 / URL

2926 Oct. 2006 Fig. 4 4 Backward Link The distribution of backward links. 6 Table 6 The influence of the multiplicity. 10 20 30 65.1% 59.1% 56.8% 88.4% 82.2% 83.3% 2 Arm 3 12 30 30 CP 6 6 CP 4.5 Wikipedia Web Zipf 4 United States United Kingdom square kilometer Marriage World War II Backward Link 100 Backward Link 100 Wikipedia 34,586 1 WordNet WordNet 20 13,735 22,196 30 34,586 WordNet Backward Link 10 188,094 5. Wiki Wikipedia Web Wikinews Web Wikipedia 1 n-gram

Vol. 47 No. 10 Wikipedia 2927 21 COE 18049050 1) Berners-Lee, T., Hendler, J. and Lassila, O.: The Semantic Web, Scientific American, pp.35 43 (2001). 2) Brill, E.: A Simple Rule-based Part of Speech Tagger, Proc. Conference on Applied Computational Linguistics (ACL), pp.112 116 (1992). 3) Chen, H., Yim, T. and Fye, D.: Automatic Thesaurus Generation for an Electronic Community System, Journal of the American Society for Information Science, Vol.46, No.3, pp.175 193 (1995). 4) Chen, Z., Liu, S., Wenyin, L., Pu, G. and Ma, W.Y.: Building a Web Thesaurus from Web Link Structure, Proc. ACM SIGIR, pp.48 55 (2003). 5) Cooley, R., Mobasher, B. and Srivastava, J.: Web Mining: Information and Pattern DiscoveryontheWorldWideWeb,Proc. 9th IEEE International Conference on Tools with Artificial Intelligence, pp.558 567 (1997). 6) Craswell, N., Hawking, D. and Robertson, S.: Effective Site Finding using Link Anchor Information, Proc. ACM SIGIR, pp.250 257 (2001). 7) Crouch, C.J.: A Cluster Based Approach to Thesaurus Construction, Proc. ACM SIGIR, pp.309 320 (1988). 8) Davison, B.D.: Topical Locality in the Web, Proc. ACM SIGIR, pp.272 279 (2000). 9) Dean, J. and Henzinger, M.R.: Finding Related Pages in the World Wide Web, Proc. 8th International World Wide Web Conference, pp.1467 1479 (1999). 10) Edmundson, H.P.: New Methods in Automatic Extracting, J. ACM, Vol.16, No.2, pp.264 285 (1969). 11) Facca, F.M. and Lanzi, P.L.: Mining Interesting Knowledge from Weblogs: A Survey, Data and Knowledge Engineering, Vol.53, No.3, pp.225 241 (2005). 12) Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment, J. ACM, Vol.46, No.5, pp.604 632 (1999). 13) Kumar, R., Novak, J., Raghavan, P. and Tomkins, A.: Structure and evolution of blogspace, Comm. ACM, Vol.47, No.12, pp.35 39 (2004). 14) Lawrence, P., Sergey, B., Rajeev, M. and Terry, W.: The PageRank Citation Ranking: Bringing Order to the Web, Technical Report, Stanford Digital Library Technologies Project (1999). 15) Leuf, B. and Cunningham, W.: The Wiki Way: Collaboration and Sharing on the Internet, Addison-Wesley (2001). 16) Miller, G.A.: WordNet: A Lexical Database for English, Comm. ACM, Vol.38, No.11, pp.39 41 (1995). 17) Schutze, H. and Pedersen, J.O.: A Cooccurrence-based Thesaurus and Two Applications to Information Retrieval, International Journal of Information Processing and Management, Vol.33, No.3, pp.307 318 (1997). 18) Tseng, Y.H.: Automatic Thesaurus Generation for Chinese Documents, Journal of the American Society for Information Science and Technology, Vol.53, No.13, pp.1130 1138 (2002). ( 17 11 4 ) ( 18 7 4 ) 2001 2003 2004 WWW ACM IEEE

2928 Oct. 2006 1995 1997 2002 2004 1996 2000 IEEE ACM 1975 1980 2002 2000 2003 Data & Knowledge Engineering ACM IEEE 8