Apte Yang 2 Lewis 3 Cohen 4.. UC SVM. 2 3 SVM UC 4 5.! Lin Shian-Hua.. 2. n n. n!! 2! n.. Saton 0 K 3 tf i > og N / n i w i =. "!. 2 tf j > og

Similar documents
kut-paper-template.dvi

29 jjencode JavaScript

IPSJ SIG Technical Report Vol.2010-CVIM-170 No /1/ Visual Recognition of Wire Harnesses for Automated Wiring Masaki Yoneda, 1 Ta

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

2015 9

johnny-paper2nd.dvi

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

,,.,.,,.,.,.,.,,.,..,,,, i

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat


IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

1 1 tf-idf tf-idf i

IS1-09 第 回画像センシングシンポジウム, 横浜,14 年 6 月 2 Hough Forest Hough Forest[6] Random Forest( [5]) Random Forest Hough Forest Hough Forest 2.1 Hough Forest 1 2.2

総 説 6 6 PIMs P S J 7

kut-paper-template.dvi

[2] , [3] 2. 2 [4] 2. 3 BABOK BABOK(Business Analysis Body of Knowledge) BABOK IIBA(International Institute of Business Analysis) BABOK 7

A pp CALL College Life CD-ROM Development of CD-ROM English Teaching Materials, College Life Series, for Improving English Communica

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-CVIM-186 No /3/15 EMD 1,a) SIFT. SIFT Bag-of-keypoints. SIFT SIFT.. Earth Mover s Distance

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

_念3)医療2009_夏.indd

Web Web Web Web 1 1,,,,,, Web, Web - i -

Microsoft PowerPoint - SSII_harada pptx

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

ALAGIN (SVM)


IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions

…p…^†[…fiflF”¯ Pattern Recognition

大学論集第42号本文.indb

ディスプレイと携帯端末間の通信を実現する映像媒介通信技術

<95DB8C9288E397C389C88A E696E6462>

soturon.dvi

main.dvi

Bulletin of JSSAC(2014) Vol. 20, No. 2, pp (Received 2013/11/27 Revised 2014/3/27 Accepted 2014/5/26) It is known that some of number puzzles ca

untitled

Q-Learning Support-Vector-Machine NIKKEI NET Infoseek MSN i

橡最終原稿.PDF

28 Horizontal angle correction using straight line detection in an equirectangular image

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

untitled

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis

2017 (413812)

Web Web Web Web Web, i

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

DEIM Forum 2010 A3-3 Web Web Web Web Web. Web Abstract Web-page R

IPSJ SIG Technical Report Vol.2009-CVIM-167 No /6/10 Real AdaBoost HOG 1 1 1, 2 1 Real AdaBoost HOG HOG Real AdaBoost HOG A Method for Reducing


60 90% ICT ICT [7] [8] [9] 2. SNS [5] URL 1 A., B., C., D. Fig. 1 An interaction using Channel-Oriented Interface. SNS SNS SNS SNS [6] 3. Processing S

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

63 Author s Address: A Study on the Activities and Characteristics of Johnny s fans in china WEI Ran, LU Yijing Foreign Lang

2 except for a female subordinate in work. Using personal name with SAN/KUN will make the distance with speech partner closer than using titles. Last

Web Basic Web SAS-2 Web SAS-2 i

udc-2.dvi

null element [...] An element which, in some particular description, is posited as existing at a certain point in a structure even though there is no

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

計量国語学 アーカイブ ID KK 種別 特集 招待論文 A タイトル Webコーパスの概念と種類, 利用価値 語史研究の情報源としてのWebコーパス Title The Concept, Types and Utility of Web Corpora: Web Corpora as

3. ( 1 ) Linear Congruential Generator:LCG 6) (Mersenne Twister:MT ), L 1 ( 2 ) 4 4 G (i,j) < G > < G 2 > < G > 2 g (ij) i= L j= N

SketchPoint Pie-Menu On/Off 3 Pie-Menu 8 6 On/Off SketchPoint i

パナソニック技報

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

Sobel Canny i

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

IPSJ SIG Technical Report 1,a) 1,b) 1,c) 1,d) 2,e) 2,f) 2,g) 1. [1] [2] 2 [3] Osaka Prefecture University 1 1, Gakuencho, Naka, Sakai,

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

206“ƒŁ\”ƒ-fl_“H„¤‰ZŁñ

[1] SBS [2] SBS Random Forests[3] Random Forests ii

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

1 Web Web 1,,,, Web, Web : - i -

DEIM Forum 2010 D Development of a La

( )

日本感性工学会論文誌

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

Honda 3) Fujii 4) 5) Agrawala 6) Osaragi 7) Grabler 8) Web Web c 2010 Information Processing Society of Japan

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

7,, i

cikm_field_weights.dvi


2 236

On Sapir's Principles of Historical Linguistics (I) An Interpretation on Sapir's View of Language Contact Nobuharu MIWA Abstract This paper is an atte

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

% 95% 2002, 2004, Dunkel 1986, p.100 1

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

013858,繊維学会誌ファイバー1月/報文-02-古金谷

早稲田大学現代政治経済研究所 ダブルトラック オークションの実験研究 宇都伸之早稲田大学上條良夫高知工科大学船木由喜彦早稲田大学 No.J1401 Working Paper Series Institute for Research in Contemporary Political and Ec

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect

Web-ATMによる店舗向けトータルATMサービス

fiš„v8.dvi

26 Development of Learning Support System for Fixation of Basketball Shoot Form

IT,, i

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a

Transcription:

24 1 Vo. 24 No. 1 2001 1 CHINESE J. COMPUTERS Jan. 2001!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 100080. SVM SVM. SVM. SVM.. TP391 A Chinese Web Page Cassifier Based on Support Vector Machine and Unsupervised Custering LI Xiao-Li LIU Ji-Min SHI Zhong-Zhi Institute of Computing Technoogy Chinese Academy of Sciences Beijing 100080 Abstract This paper presents a new agorithm that combines Support Vector Machine SVM and unsupervised custering. After anayzing the characteristics of web pages it proposes a new vector representation of web pages and appies it to web page cassification. Given a training set the agorithm custers positive and negative exampes respectivey by the unsupervised custering agorithm UC which wi produce a number of positive and negative centers. Then it seects ony some of the exampes to input to SVM according to ISUC agorithm. At the end it constructs a cassifier through SVM earning. Any text can be cassified by comparing the distance of custering centers or by SVM. If the text nears one custer center of a category and far away from a the custer centers of other categories UC can cassify it righty with high possibiity otherwise SVM is empoyed to decide the category it beongs. The agorithm utiizes the virtues of SVM and unsupervised custering. The experiment shows that it not ony improves training efficiency but aso has good precision. Keywords support vector machine custering text cassification 1 Internet.... Internet 1999-11-17. 69803010 863-511-946-010. 1969. 1967. 1941.

63.... Apte Yang 2 Lewis 3 Cohen 4.. UC SVM. 2 3 SVM UC 4 5.! Lin Shian-Hua.. 2. n n. n!! 2! n.. Saton 0 K 3 tf i > og N / n i w i =. "!. 2 tf j > og N / n j j. SVM. SVM. SVM 8 9. SVM.. UC.... SVM UC tf i N n i. IG! 2 -test CHI MITS..... HTML. HTML. P BRDOCTYPE. TITLEH H2 H6 B UIA HREF =.. HTML Meta name = description content =. UC SVM. SVM Meta name = keywords content = Meta name = cassification content =. TITLE

64 200. H H6 H H6. B U I. URL. Meta.. S = TITLE H H2 H3 H4 H5 H6 B U I URL Meta. W = W A IA S E W A X tf A i X Iog N / I i A S w i = W A X tf A 2 X Iog N / I ~ E E A W A A W TITLE > W H > W H2 > W H3 > > W Meta. tf A i i A. 3 SVM UC WEB. I L = O O2 OI. Oi N i I Oi. Oi I N i E N. = # i. I N i. O L E = x y x 2 y 2 x y! i " I y i + -. y i = + x i O y i = - x i$o. x x O x$ O. x I +. 2. SVM.. SVM. E = z i y i I 2 z i R N y i - + w 6 R w 6 = 2 f w 6 z - y Id P x y. P x y! f w z 6 = sgn w z + 6. w 6 f w z 6 max W O =E Oi - 2 EOiO y i y z i z 0SOi SY 2 E Oi y i = 0. O w = E y ioiz i w. z i I f w z 6 i I = 6. x O z = G x f z = sgn[ E y ioi z z i + 6 ]. f z = x O x. z =G x exp - H 2 x - x ( ih c ). SVM. 3.1 UC UC. r Z = x x 2 x m! i " I. UC Z x i Step. C x 0 x m. Step2. Z = Step3. x i 0 stop. x IumCuster Z x 2 x 3 Z x i 0 arg IumCuster min I = < r Step4. d x i 0 d x i 0 I. x i C C C x i

65 n X 0 + x i c 0 一 n n + 一 n + Go to Step6. Step5.. numcuster 一 numcuster + c numcuster 一 x i 0 numcuster 一 x i. Step6. Z 一 Z - z i go to Step2. UC numcuster m c c 2 c numcuster 0 c n c. Step2 Step6. x i i = 2 m.. numcuster X m numcuster.. UC! " + " - UC. " + = x i I x i y i G E y i " - = x i I x i y i G E y i = = -. " + 0 + 02 + 0 u + " - 0-0 2-0 1 -. x. d + x = min u d x d x 0 + i d - x = min 1 d x 0. ī yx y. d + x < d - x xg! x 奏!. r. UC. UC SVM.!! 0 + 0 - # #> 0! \ d x + - d x - \ <# UC! UC SVM.! \ d x + - d x - \ >#. d x + < d x -! 0 +!G!! 奏!.! UC. 3.2 SVM UC ISUC SVM UC. ISUC. r UC. SVM.. 2. R R > r SVM " + U x I x G " - k 八 x GU B 0 + R i. B 0 + R i 0 i + R. 2. \R. r.. R SVM.

66 200. ISUC Step. i 一 S T 一 1. Step2. i > ugo to Step6 u. Step3. i + d x j i + < R j =!! 2! G! -. Step4. S T 一 S TU!! 2!. Step5. i 一 i + go to Step2. Step6. S T 一 S TU! + S T SVM SVM.! " d x + d x - I d x + - d x - I >".! UC!.! d x + < min U d!!g# ī!! 奏 #. UC!.!. UC. SVM. " ISUC. TISUC. Step. T = 1!G T. Step2. d x + 一 min u Step3. I d x + d! + i d x - 一 min U - d - x I >" go to Step7. d!. ī Step4. SVM f! = sgn [ } i#i! x i + b] Step5. f! =!G#! 奏 #. Step6. go to Step8. Step7. d + x < d - x!g#! 奏 #. Step8. T 一 T -! go to Step. 4 3548 3.. 283.. 3265. 9000 4265... UC SVM..... SVM UC.. SVM UC. SVM. 1 SVM UC % % % SVM 98. 60 90. 97 90. 9 UC r = 0.3 90.97 327 3239 89.32 56 3009 89.48 495 307 UC r = 0.7 96.23 268 2696 89.53 49 253 89.53 424 2596 UC r =.0 96.83 94 87 89.48 34 760 88.68 306 766 UC r =.2 96.5 05 706 87.67 79 483 83.95 88 495 UC r =.4 92.82 74.23 67.59 " # I " - # I 2 = I " I 2 + I # I 2-2 I " I I # I = 2-2 I " I I # I 三 2 $ 三 I " - # I 三 2.

1 67 UC!! = 1" 4 UC "! "! = 1" 2" 105 706 " 2 ISUC "! # "!= 0"3 UC SVM " 2 ISUC! # # NexampIe cut # CaII SVM # SV Precision SVM % ISUCprecision % 0"3 0"5 $ 7820 1553 $ 112 10"03 74"90 0" 3 1" 0 6008 1553 678 83" 25 95" 12 0" 3 1" 3 2351 1553 937 97" 1 98" 81 1" 2 1" 25 4450 2561 543 96" 72 99" 19 1" 2 1" 3 2132 2561 651 97" 54 98" 57 1" 2 1" 4 1023 2561 709 97" 66 98" 62 0"3 0"5 $ 8251 2195 $ 120 21"10 70"09 0" 3 1" 0 5422 2195 835 68" 22 88" 20 0" 3 1" 3 2198 2195 1323 83" 97 91" 86 1" 2 1" 25 5321 3094 1425 83" 29 91" 25 1" 2 1" 3 3800 3094 1672 87" 10 91" 42 1" 2 1" 4 1923 3094 1714 87" 56 91" 42 0"3 0"5 $ 7693 2093 $ 342 26"11 70"51 0" 3 1" 0 5017 2093 625 59" 68 80" 55 0" 3 1" 3 1296 2093 1389 82" 78 91" 80 1" 2 1" 25 5982 3290 1322 87" 78 91" 27 1" 2 1" 3 4336 3290 1481 87" 97 91" 34 1" 2 1" 4 1295 3290 1481 87" 97 91" 34 2! SVM # CaII SVM "! "! # NexampIe cut SVM Precision SVM ISUC ISUCprecision. SVM # SV " "! = 1" 2 #! 1" 25 1" 4 # # SV. ISUCprecision SVM. SVM "! = 1" 2 # = 1" 25 ISUC 1 / 3. UC SVM " " " 5 SVM UC. SVM SVM. SVM. UC SVM... ISUC! #!. 1 Apte C Damerau F Weiss S. Automated Iearning of decision

68 2001 ruies for text categorization. ACM Transactions on Information System 1994 12 3 233-251 2 Yang Y. Expert network Effective and efficient iearning from human decisions in text categorization and retrievai. In Proc Seventeenth Internationai ACM SIGIR Conference on Research and Deveiopment in Information Retrievai Dubiin 1994. 13-22 3 Lewis D D Schapore R E Caiian J P Papka R. Training aigorithms for iinear text ciassifiers. In Proc Nineteenth Internationai ACM SIGIR Conference on Research and Deveiopment in Information Retrievai Zurich 1996. 298-306 4 Cohen W W Singer Y. Context-sensitive iearning methods for text categorization. In Proc Nineteenth Internationai ACM SIGIR Conference on Research and Deveiopment in Information Retrievai Zurich 1996. 307-315 5 Lin Shian-hua. Extracting ciassification knowiedge of internet documents with mining term associations A sementic approach. In Proc Internationai ACM SIGIR Conference on Research and Deveiopment in Information Retrievai Meibourne 1998. 241-249 6 Vapnik V. The Nature of Statisticai Learning Theory. New York Springer-Veriag 1995 7 Vapnik V. Estimation of Dependences Based on Empiricai Data. New York Springer-Veriag 1982 8 Bernhard Schoikopf Sung Kah-Kay et a. Comparing support vector machines with gaussian kerneis to radicai basis function ciassifiers. IEEE Transactions on Signai Processing 1997 45 11 2758-2765 9 Edgar Osuna Robert Freund Federico Girosi. Training support vector machines An appiication to face detection. In Proc IEEE Conference on Computer Vision and Pattern Recognition Puerto 1997. 130-136 10 Saiton. Introduction to Modern Information Retrievai. New York Mc- Graw-hiii Book Company 1983 11 Yang Yi-Ming Jan O Pederson. A comparative study on feature seiection in text categorization. In Proc 14th Internationai Conference on Machine Learning Nashviiie 1997. 412-420 12 Li Xiao-Li Shi Zhong-Zhi. A data mining method appiying to acguire part of speech ruies in Chinese text. Computer Research and Deveiopment Accepted in Chinese..