2 2 2 3 3 4 4 4 4 4 5 5 5 6 N 6 7 Patricia 7 7 7 8 8 Namazu9 Namazu 9 9 10 10 10 10 11 11 HTML 11 12 web 12 12 URL 14
全文検索システムの機能とその活用 渡邉里美 swatan14@cs.reitaku-u.ac.jp 麗澤大学国際経済学部国際経済学科 概要 WWW WWW WWW The function and its practical use of a full-text search system Satomi WATANABE Reitaku University Abstract We can obtain various information now easily by the spread of the Internet so that it may be represented by the information retrieval by WWW. However, it became difficult for required information to come to hand with increase of the amount of information. On the Internet or intranet, the information retrieval system for taking out required information attracts attention. In this paper, the system using a full-text search function was observed. A main subject explains the knowledge needed when using a full-text search system. Next, it introduces using actually typical free full-text search software namazu about the example, which built the mailing list search engine, which made a full-text search and WWW cooperate. Keywords: Information retrieval,, full-text search system, Namaz mazumorphological analysis 1
DBMS 1 web () 2 UNIX grep 3 grep 1 2 2
5 () () URL URL URL 263 3
[1] Lycos spider() Infoseek InfoSeek Robot HTML HTMLXML PDFWord ChaSen KAKASI N web CGI 4
() KAKASI[2] KAKASI(kanji kana simple inverter) 4 KAKASI SKK 5 [3] LARGE LARGE [4] () 5
JUMAN[5][6] ChaSen[7] JUMAN version 2.0 [8] [9] NTT ( NTT ) ()[10] Breakfast [11] Windows SuperMorpho-J[12] N 6 1 1 N 6 N N N N () 例文 : 東京都の明日の天気予報を確認する N=1 (uni-gram) 東 N=2 (bi-gram) 東京 N=3 (tri-gram) 東京都 京 京都 京都の 都の明日の天気予 都のの明明日日のの天天気気予予報 都の明の明日明日の日の天の天気天気予気予報予報を 6
1 1 (2 3 ) patricia Patricia Practical Algorithm To Retrieve Information Code In Alphanumeric () 7 例 : A man in the room. リストNO 半無制限文字列 1 A man in the room. 2 man in the room. 3 an in the room. 4 n in the room. 5 in the room. 6 n the room. 7 the room. 8 he room. 9 e room. 10 room. 11 oom. 12 om. 13 m. リストNO 半無制限文字列 1 A man in the room. 2 an in the room. 3 e room. 4 he room. 5 in the room. 6m. 7 man in the room. 8 n in the room. 9 n the room. 10 om. 11 oom. 12 room. 13 the room. 7
[13] Inktomi Search Software 4.0 ( UltraSeek) Inktomi Search Software4.0 ultraseek Inktomi Ultraseek Digital Garage Inktomi Japan MS-WordExcelPowerPointPDF Verity Information Server Verity Information Server( Search 97 Information Server) VERITY (Super Morpho-J) Namazu[14] Namazu 5 Namazu NTT DoCoMo web 8
Freya [15] Freya Namazu Namazu Namazu Namazu [14] Windows UNIX OS Web WWW Namazu Namazu Namazu [16][17] LinuxFreeBSD Solaris UNIX Windows OS2 Namazu CGI GUI X window system Windows web CGI(namazu.cgi) namazu CGI Namazu Namazu nkf() Perl KAKASI( ) ChaSen() MHonArc nkf Perl Namazu C Perl KAKASI ChaSen Windows MS-Word MS-ExcelPDF 9
MHonArc RFC822 MINE HTML 6 Namazu 1 PC CPU Pentium4 1.5GHz 256MB 80G WWW Apache Apache UNIX [18] [19] WWW CGI WWW 5 Namazu Namazu Perl 10
5 Perl UNIX OS GNU nkf 1.9 8 1.72 KAKASI Text-KAKASI KAKASI Perl Text-KAKASI KAKASI 2 File-MMagic File-Mmagic Namazu CPAN 9 MHonArc MHonArc MHonArc [17] Namazu Namazu Namazu 8 nkf1.9 Namazu 9 Comprehensive Perl Active Network gcc make GMU Make csh (tcsh) Sh (bash) impression office[18] 1 1 1 HTML MHonArc 11
HTML HTML nkf EUC [19]EUCShift-JISJIS 1 HTML HTML HTML 3 web Namazu CGI WWW WWW Namazu Apache CGI Namazu CGI Namazu Namazu 1999 年 2000 年 2001 年 文書数 ( ファイル数 ) 821 2,910 2,380 文書サイズ (KB) 4,076 14,260 8,404 インデックス作成時間 ( 秒 ) 690 2,537 2,056 インデックスサイズ (bytes) 2,867,825 10,145,199 8,518,327 キーワード数 13,882 32,964 27,861 12
13
Namazu (KAKASI ChaSen ) [23] 1999 AND/OR [24] URL [1] The Web Robots Pages http://www.robotstxt.org/wc/robots.html [2] KAKASI http://kakasi.namazu.org/ [3] SKK http://openlab.ring.gr.jp/skk/index-j.html [4] http://www.kusastro.kyoto-u.ac.jp/~baba/di c/free-dic.html [5] JUMAN http://pine.kuee.kyoto-u.ac.jp/nl-resource/ juman.html [6] JUMAN version 1.0 http://www.naklab.dnj.ynu.ac.jp/~komachi/ manual/maincont2.html [7] http://chasen.aist-nara.ac.jp/index.html.ja [8] http://cactus.aist-nara.ac.jp/lab/nlt/ vi4ma.html [9] http://www.t.onlab.ntt.co.jp/sumomo/ index.html [10] http://www.iijnet.or.jp/edr/j_index.html [11] Breakfast http://www.labs.fujitsu.com/free/breakfast/ index.html [12] SuperMorpho-J 14
http://www.omronsoft.co.jp/sp/embedded morpho/ [13] http://www.kusastro.kyoto-u.ac.jp/~baba/ wais/other-system.html [14] Namazu http://www.namazu.org/ [15] Freya http://www.ingrid.org/ja/project/freya/ [16] Namazu 2001 [17] 1998 [18] TA NO.3pp.32001 [19] NO.1pp.22001 [20] MHonArc http://www.mhonarc.org/ [21] impression office http://www.asi.co.jp/imoffice/ [22] MhonArc http://www.shiratori.riec.tohoku.ac.jp/ ~p-katoh/hack/docs/mhonarc-jp/ [23] http://www.gengokk.co.jp/zenbun.htm [24] RCAAU http://www.kuamp.kyoto-u.ac.jp/labs/ infocom/mondou/index.html [25] 10 pp.90-991999 http://www.ftsanet.com/dbtokyo99/ Db99.htm [26] 1998 15