2. 2. 1 Ducky 1. GUI, Web, Web URL,, 2., CSS (2. 2. 1), xml, json, csv,,, Web DB HTML id class, class,. com, div unit,, CSS CSS, Web, Web, JavaScript



Similar documents
Web Web [12] Web HTML HTML Web Web Web Web HTML Web Web Web Web Web Web Web Web Ducky[6][7] Ducky Web Web Ducky GUI GUI GUI Web 2 Ducky 3 GUI

([ ]!) name1 name2 : [Name]! name SuperSQL,,,,,,, (@) < >@{ < > } =,,., 200,., TFE,, 1 2.,, 4, 3.,,,, Web EGG [5] SSVisual [6], Java SSedit( ss

Gray [6] cross tabulation CUBE, ROLL UP Johnson [7] pivoting SQL 3. SuperSQL SuperSQL SuperSQL SQL [1] [2] SQL SELECT GENERATE <media> <TFE> GENER- AT

WIX. URL, WIX. URL,, WIX., Web. id (eid), keyword target. 1 entry wid eid keyword target

: Name, Tel name tel (! ) name : Name! Tel tel ( % ) 3. HTML. : Name % Tel name tel 2. 2,., [ ]!, [ ]!, [ ]!,. [! [, ]! ]!,,. ( [ ], ),. : [Name], nam

, HTML HTML PHP, 3. SuperSQL SuperSQL [1] [2], SQL, SQL SELECT GENERATE <media> <TFE> GENERATE <media>, HTML XML, PDF <TFE> Target Form Expression,, 3

DEIM Forum 2019 H2-2 SuperSQL SuperSQL SQL SuperSQL Web SuperSQL DBMS Pi

教師情報を必要としないWebページ群のコンテンツ自動抽出ツールの提案

2015 9

[1] [3]. SQL SELECT GENERATE< media >< T F E > GENERATE. < media > HTML PDF < T F E > Target Form Expression ( ), 3.. (,). : Name, Tel name tel

, [! [, ]! ]!,,., ([ ],). : [Name], name1 name2 name10 ([ ]!). name1 name2 : [Name]! name SuperSQL,,,,,,, < < > } =.,

paper.pdf

"-./0%. "-%!"#$#% $%&'(%)*+,%.!"#+$,$% &'()*% $%&'-(.(/%+,% $%&'0%12*+,'% 1 RMX.. grade gradetype= integer grade[

([ ],), : [Name], name1 name2 name10 4, 2 SuperSQL, ([ ]!), name1 name2 : [Name]! name SuperSQL,,,,,,, < < > } =,

Web


Lotus Domino XML活用の基礎!

IPSJ SIG Technical Report Vol.2014-DBS-159 No.6 Vol.2014-IFAT-115 No /8/1 1,a) 1 1 1,, 1. ([1]) ([2], [3]) A B 1 ([4]) 1 Graduate School of Info

”‰−ofiI…R…fi…e…L…X…g‡ðŠp‡¢‡½„�“õ„‰›Ê‡Ì™ñ”¦

DEIM Forum 2013 B5-2 RMX RMX RMX $, RMX Implementation of the E-m

DEIM Forum 2012 E Web Extracting Modification of Objec

IPSJ SIG Technical Report Vol.2014-HCI-157 No.26 Vol.2014-GN-91 No.26 Vol.2014-EC-31 No /3/15 1,a) 2 3 Web (SERP) ( ) Web (VP) SERP VP VP SERP

,, WIX. 3. Web Index 3. 1 WIX WIX XML URL, 1., keyword, URL target., WIX, header,, WIX. 1 entry keyword 1 target 1 keyword target., entry, 1 1. WIX [2

B 20 Web

Run-Based Trieから構成される 決定木の枝刈り法

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

3.1 Thalmic Lab Myo * Bluetooth PC Myo 8 RMS RMS t RMS(t) i (i = 1, 2,, 8) 8 SVM libsvm *2 ν-svm 1 Myo 2 8 RMS 3.2 Myo (Root

Vol. 23 No. 4 Oct Kitchen of the Future 1 Kitchen of the Future 1 1 Kitchen of the Future LCD [7], [8] (Kitchen of the Future ) WWW [7], [3

IPSJ SIG Technical Report Vol.2015-CLE-16 No /5/23 RESTful Web API Web 1,2,3,4,a) 1,3,2,4 5,6 6 Wannous Muhammad 7,1,8 4,2,1 3,2,1 Maxima Web JS

2. Twitter Twitter 2.1 Twitter Twitter( ) Twitter Twitter ( 1 ) RT ReTweet RT ReTweet RT ( 2 ) URL Twitter Twitter 140 URL URL URL 140 URL URL


1 AND TFIDF Web DFIWF Wikipedia Web Web AND 5. Wikipedia AND 6. Wikipedia Web Ma [4] Ma URL AND Tian [8] Tian Tian Web Cimiano [3] [

2009/9 Vol. J92 D No. 9 HTML [3] Microsoft PowerPoint Apple Keynote OpenOffice Impress XML 4 1 (A) (C) (F) Fig. 1 1 An example of slide i

pdf

橡SPA2000.PDF

ルール&マナー集_社内版)_修正版.PDF

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s


1 SuperSQL web HTML, SuperSQL PHP. SuperSQL, 1, XML, JavaScript SuperSQL web,, web Web Web, PHP [7], Ruby [8], Perl [9].,,,. Web,, HT

BOK body of knowledge, BOK BOK BOK 1 CC2001 computing curricula 2001 [1] BOK IT BOK 2008 ITBOK [2] social infomatics SI BOK BOK BOK WikiBOK BO

FIT2014( 第 13 回情報科学技術フォーラム ) RD-002 Web SNS Yuanyuan Wang Gouki Yasui Yuji Hosokawa Yukiko Kawai Toyokazu Akiyama Kazutoshi Sumiya 1. Twitter 1 Facebo

_IMv2.key

. ([ ],) : [Name] name1 name2 name10 ([ ]!). name1 name2 : [Name]! name (@) < >@{ < > } = [employee.name@{width=200 color=red}]! l

Microsoft Word - deim2011_new-ichinose doc

Web STEPS Web Web Form Cookie HTTP STEPS Web

1 3 [1] [2, 3] WWW 2.1 WWW WWW DjVu 3 ( 1) 2 DjVu DjVu DjVu[2] 16 ( ) http

Microsoft Word - toyoshima-deim2011.doc

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

World Wide Web =WWW Web ipad Web Web HTML hyper text markup language CSS cascading style sheet Web Web HTML CSS HTML

DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

Web±ÜÍ÷¤Î³Ú¤·¤µ¤ò¹â¤á¤ëWeb¥Ú¡¼¥¸²ÄÄ°²½¥·¥¹¥Æ¥à

untitled

2

CX-Checker CX-Checker (1)XPath (2)DOM (3) 3 XPath CX-Checker. MISRA-C 62%(79/127) SQMlint 76%(13/17) XPath CX-Checker 3. CX-Checker 4., MISRA-C CX- Ch

知識ベースCFD

JavaScript の使い方

22 (266) / Web PF-Web Web Web Web / Web Web PF-Web Web Web Web CGI Web Web 1 Web PF-Web Web Perl C CGI A Pipe/Filter Architecture Based Software Gener

untitled

Lyra X Y X Y ivis Designer Lyra ivisdesigner Lyra ivisdesigner 2 ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) (1) (2) (3) (4) (5) Iv Studio [8] 3 (5) (4) (1) (

23

XISによる効率良いシステム開発のポイント

wide94.dvi


20mm 63.92% ConstantZoom U 5


!!!!!

11 Bootstrap Font Awesome $ cd ~/projects/modest_greeter $ npm install --save jquery popper.js tether --save package.json depen

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect

_...j.f......_..

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +

DEIM Forum 2014 B Twitter Twitter Twitter 2006 Twitter 201

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

Windows Macintosh 18 Java Windows 21 Java Macintosh

理工ジャーナル 23‐1☆/1.外村


(a) (b) 1 JavaScript Web Web Web CGI Web Web JavaScript Web mixi facebook SNS Web URL ID Web 1 JavaScript Web 1(a) 1(b) JavaScript & Web Web Web Webji

IPSJ SIG Technical Report Vol.2011-CE-110 No /7/9 Bebras 1, 6 1, 2 3 4, 6 5, 6 Bebras 2010 Bebras Reporting Trial of Bebras Contest for K12 stud

fiš„v3.dvi

■サイトを定義する

●70974_100_AC009160_KAPヘ<3099>ーシス自動車約款(11.10).indb

Wiki Wiki Wiki...

tkk0408nari

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

IPSJ SIG Technical Report Vol.2013-ICS-172 No /11/12 1,a), 1,b) Anomaly Detection 1. 1 Nagoya Institute of Technology 1 Presently with Nagoya In


,,,,., C Java,,.,,.,., ,,.,, i

2reN-A14.dvi

java_servlet2_見本

IPSJ SIG Technical Report Vol.2009-HCI-134 No /7/17 1. RDB Wiki Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visibl

DEIM Forum 2010 D Development of a La

Wikipedia YahooQA MAD 4)5) MAD Web 6) 3. YAMAHA 7) 8) Vocaloid PV YouTube 1 minato minato ussy 3D MAD F EDis ussy

jquery


XMLとは、eXtensible Markup Languageの略で、拡張可能なマーク付け言語である

paper


_314I01BM浅谷2.indd

2 21, Twitter SNS [8] [5] [7] 2. 2 SNS SNS Cheng [2] Twitter [6] Backstrom [1] Facebook 3 Jurgens


Transcription:

DEIM Forum 2016 F1-5 Web Ducky GUI 223-8522 3-14-1 E-mail: kei@db.ics.keio.ac.jp, toyama@ics.keio.ac.jp Web, 2,,,, Ducky Ducky Web URL CSS,, Ducky GUI. GUI, Web,,. Web, Web, 1. Web, 2,,,,, Web Web, Web [13]., Web HTML,,, HTML, Web Web, Web Web HTML,, Web,, Web, Web, Web Web,,,, Web,,,, Web, Web Ducky [8] [9], Ducky Web, Web GUI,,,, Web,, 2. Ducky 3. GUI, 4., 5. 6., 7.

2. 2. 1 Ducky 1. GUI, Web, Web URL,, 2., CSS (2. 2. 1), xml, json, csv,,, Web DB HTML id class, class,. com, div unit,, CSS CSS, Web, Web, JavaScript (DOM) jquery. W3Techs 2, 2015 8 jquery Web 65.5 jquery CSS, Web CSS 1 { } "name" : "", "author" : "", "frequency" : "", "format" : "", "scraping" : [{ }] 2 2. 2 2. 2. 1 CSS \\ \\ \\ \\ \\ CSS Xpath, HTML. 3.com 1, HP URL, HTML CSS, div.unit li > a CSS HTML id, class,, Ducky CSS, 2 CSS Web 1 http://eiga.com/link/ 3 2. 2. 2 HTML, scraping 4 1. GUI Web, CSS selector scraping, 1 url, selector URL, HTML CSS HTML. CSS 2 data field attr, find, 2 Web. http://w3techs.com/

1 scraping array url string URL selector string CSS data array ( ) field string attr string selector find string selector CSS blank remove array parentheses string, replace array next object "scraping" : [{ "url" : " ", "selector" : " ", "data" : [{ "field" : " ", "attr" : " ", "find" : " ", "remove" : [" ", " ", ], "replace" : [[" ", " "], ] }], "next" : { } }] 4 remove replace, 3 next next, URL url (1)., DB. next Web,, (3. 2. 3 ) (2. 2. 2),, GUI, Ameba 3 ( 5). 50 4, URL, name, url 6. 3. 1 3. 1. 1 URL GUI,, 50,, CSS,, 3. 1. 2,, ( 5 )., 3. GUI GUI, Web URL, URL, HTML GUI.. GUI 3. 1. 3 URL, ( 5 ). 3. 2 3. 2. 1 CSS GUI 3 http://official.ameba.jp/ 4 http://official.ameba.jp/genrekana/kanatop.html

5 GUI CSS,,, 50 CSS, 6 div.syllabarymdl > table > tbody > tr > td > a ( ). 6 Ameba, CSS (Algorithm1). CSS, body, class Algorithm 1 Pseudocode of generating CSS selector Declare CSS selector called C Require: node Ensure: C of node T N tagname of node CN classname of node while TN is not Body do if CN is not null then else C+ = T N +. + CN C+ = T N end if node parentnode of node end while 3. 2. 2, alt, ( 5 ). Web,. HTML 7, a img, a, 5, HTML 7, img src alt, a href, 7 HTML

3. 2. 3, href,, next ( 6). 6, 3 selector div.syllabarymdl > table > tbody > tr > td > a 5 50 CSS a, href URL next url (2. 2. 2). URL, URL, next Web Web, 8 Web ( A) 4. 4. 1 Web Web,, 2 3. Web 2,,, 2 Web A B C 2 A 8. 8 Web Web, Web,., CSS, Web GUI, Web., B 9. 9 Web Web, Web, Web, CSS, Web GUI, Web,, C, Web, 9 Web ( B) Web,., 0,, Web,, GUI,, B C Web, 9,, 4. 2 10. GUI,. 0, 0, Web,, 3. 2. 1

Algorithm 1 CSS CSS, 1 CSS,, CSS CSS, P hk = A k Ch C h R hk = A k Ch A k 5. 4 (1) (2), import io [1] kimono [2]. 5. 4. 1 import io kimono Ducky, 4 1 2 3 Web 4 Web 10 5., 2 1, 2, 5. 1, 2, 5. 2, Web, 5. 3 C = {C 1, C 2,, C h }, A = {A 1, A 2,, A k }, ( (1)), ( (2)) 3. 1, import io, kimono Ducky Web, import io, kimono Ducky Web API 2, kimono Ducky 3 Web 4 Web, Ducky, 3 import io kimono Ducky 1 Web Web 2 3 4 5. 4. 2 Web Web,, a. Web b. Web c. d. A. (HTML ) B. (HTML ) C.

D. E. F. Null 5 Web 5. 4. 3 Web, Web 4. A F Web A (HTML ),. B (HTML ),, C, import io, kimono Ducky kimono Web, PC,. Web, Ducky, D Null. Null,. import io, kimono, Null,, E F, Ducky, Web 4 Web import io kimono Ducky a b c d A 10 100 100 100 100 100 100 3 261 B 15 83.4 100 36.9 59.9 87.1 100 5 78 C 10 50 50 100 100 100 100 4 338 D 2 0 0 85 73.9 100 100 5 60 E 2 99.3 100 98.7 100 85.7 68.9 2 610 F 5 100 100 100 100 - - 3 100 5. 4. 4 Web kimono Ducky, Web 5., Web, kimono, a b import io kimono c d 15 44 87.6 72.1 89.9 100 5 12615 2 52 36.5 19.9 68.1 72.1 8 18739 5. 5, Web Web, 2 Web, 6 5. 5. 1, 6. 3, 4, 5, 100%, 9 CSS,, Web next 1, 2, 6, 0% Web URL,.,,. 6 1 23.8 19.9 10 2 0 0 0 3 100 100 3631 4 100 100 41 5 100 100 52 6 0 0 0 6. Web [13]. Web, (semi-automatic) (automatic) 2, Zhang [14]

Adelberg NoDoSE [3] XML, URL HTML OXPath [6] [12], Xpath,,,,,, Kushmerick [10] Kushmerick,., Chang [5]. Chang IEPAD, HTML, IEPAD HTML,,,,, HTML, IEPAD,,, URL, Web, Web, [4] [7] GUI, [11] URL OXPath [6], Web OXPath,, Web,, Web,, 7. 1, 2, Web [1] import io. https://import.io/. [2] kimono. https://www.kimonolabs.com/. [3] Brad Adelberg. NoDoSE - a tool for semi-automatically extracting structured and semistructured data from text documents. SIGMOD Rec., 27(2):283 294, June 1998. [4] Sudhir Agarwal and Michael Genesereth. Extraction and integration of web data by end-users. In Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management, CIKM 13, pages 2405 2410, New York, NY, USA, 2013. ACM. [5] Chia-Hui Chang and Shao-Chen Lui. Iepad: Information extraction based on pattern discovery. In Proceedings of the 10th International Conference on World Wide Web, WWW 01, pages 681 688, New York, NY, USA, 2001. ACM. [6] Tim Furche, Georg Gottlob, Giovanni Grasso, Christian Schallhart, and Andrew Sellers. Oxpath: A language for scalable data extraction, automation, and crawling on the deep web. The VLDB Journal, 22(1):47 72, February 2013. [7] Matthias Geel, Timothy Church, and Moira C. Norrie. Sift: An end-user tool for gathering web content on the go. In Proceedings of the 2012 ACM Symposium on Document Engineering, DocEng 12, pages 181 190, New York, NY, USA, 2012. ACM. [8] Kei Kanaoka, Yotaro Fujii, and Motomichi Toyama. Ducky: A data extraction system for various structured web documents. In Proceedings of the 18th International Database Engineering & Applications Symposium, IDEAS 14, pages 342 347, New York, NY, USA, 2014. ACM. [9] Kei Kanaoka and Motomichi Toyama. Effective web data extraction with ducky. In Proceedings of the 19th International Database Engineering & Applications Symposium, IDEAS 15, pages 212 213, New York, NY, USA, 2014. ACM. [10] Nicholas Kushmerick. Wrapper induction: Efficiency and expressiveness. Artif. Intell., 118(1-2):15 68, April 2000. [11] Tiezheng Nie, Zhenhua Wang, Yue Kou, and Rui Zhang. Crawling result pages for data extraction based on url classification. In Proceedings of the 2010 Seventh Web Information Systems and Applications Conference, WISA 10, pages 79 84, Washington, DC, USA, 2010. IEEE Computer Society. [12] Andrew Jon Sellers, Tim Furche, Georg Gottlob, Giovanni Grasso, and Christian Schallhart. Oxpath: Little language, little memory, great value. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW 11, pages 261 264, New York, NY, USA, 2011. ACM. [13] H.A. Sleiman and R. Corchuelo. A survey on region extractors from web documents. Knowledge and Data Engineering, IEEE Transactions on, 25(9):1960 1981, September 2013. [14] Suzhi Zhang and Peizhong Shi. An efficient wrapper for web data extraction and its application. In Computer Science Education, 2009. ICCSE 09. 4th International Conference on, pages 1245 1250, July 2009. Ducky GUI,