c 2017

Similar documents
目次 1. はじめに CzeekS のインストールと設定 アーカイブの展開とライセンスファイルの配置 環境変数の設定 OpenBabel の設定 化合物スクリーニングとターゲット予測 CGBVS

Łñ“’‘‚2004

プリント


Asterisk PBX 不正利用防止


第90回日本感染症学会学術講演会抄録(I)


Ł\”ƒ-2005

写真集計くん+ for Mac ユーザーズガイド

$ cmake --version $ make --version $ gcc --version 環境が無いあるいはバージョンが古い場合は yum などを用いて導入 最新化を行う 4. 圧縮ファイルを解凍する $ tar xzvf gromacs tar.gz 5. cmake を用

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

Docker Haruka Iwao Storage Solution Architect, Red Hat K.K. February 12, 2015

WebOS aplat WebOS WebOS 3 XML Yahoo!Pipes Popfry UNIX grep awk XML GUI WebOS GUI GUI 4 CUI

Unix * 3 PC 2 Linux, Mac *4 Windows Cygwin Cygwin gnuplot Cygwin unix emulator online gnuplot *5 matplotlib *6 SuperMongo *7 gnuplot gnuplot OS *8 Uni

Sybase on CLUSTERPRO for Linux HowTo

Support Vector Machine (SVM) 4 SVM SVM 2 80% 100% SVM SVM SVM 4 SVM 2 2 SVM 4

slice00_install.dvi

O1-1 O1-2 O1-3 O1-4 O1-5 O1-6

Sophos Anti-Virus UNIX or Linux startup guide

PostgreSQLによる データベースサーバ構築技法

13B1X gonyx 13B1X gonyx 13B1X gonyx 13B1X

bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows ˆ Windows10 64bit Wi

P P P P P P P OS... P P P P P P

PowerPoint プレゼンテーション

Morphological Analysis System JUMAN Copyright 2016 Kyoto University All rights reserved. Licensed under the Apache License, Version 2.0 (the Li

untitled

ソフトウェアについて Rev 年 1 月 16 日 このマニュアルでは標準でインストールしているソフトウェアの入手元 インストール方法の概要 インストール場所 についてご案内致します ABySS

_...j.f......_..

p...{..P01-48(TF)

OpenCV Windows(cygwin) Linux USB PC [1] Inel OpenCV OpenCV 1 Windows Linux OpenCV (a) (b)2 (c) (d) 1: OpenCV 1

UNIX

RAID RAID 0 RAID 1 RAID 5 RAID * ( -1) * ( /2) * RAID A. SATA SSD B. BIOS SATA ( 1) C. RAID BIOS RAID D. RAID/AHCI 2 SATA SSD ( 1) ( ( 3) 2

Nikon デジタルカメラ COOLPIX P6000 簡単操作ガイド

フリーソフトではじめる機械学習入門 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

放射線専門医認定試験(2009・20回)/HOHS‐01(基礎一次)

sanboot-whitepaper.pdf

untitled


unix.dvi

: (EQS) /EQUATIONS V1 = 30*V F1 + E1; V2 = 25*V *F1 + E2; V3 = 16*V *F1 + E3; V4 = 10*V F2 + E4; V5 = 19*V99

etrust Access Control etrust Access Control UNIX(Linux, Windows) 2


JUMAN++ version

ネットワークカメラ カメラマネジメントツール 使用説明書

Microsoft Word - K5VSSP32-install.docx

COOLPIX S8000 Software Suite Nikon AC AC 2

untitled

9 rbenv rbenv ruby 9.1 rbenv rbenv rbenv ruby ruby-build ruby 9.2 rbenv macos.bash_profile ~/.bash_profile ~/.bash_profile.bak $ touch ~/.bash_profile

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎

日本内科学会雑誌第98巻第4号

情報の分析 1. Linux ツールの活用

MENU 키를 누르면 아래의 화면이 나타납니다

Cleaner XL 1.5 クイックインストールガイド

RRA価格表

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

Microsoft Word - ChoreonoidStartUpGuide.docx

MTX/MRXシステム ファームウェア アップデートガイド

2 3


取扱説明書 [F-08D]


コンピュータ概論

E2 Spider 2018/08/03 Intel NUC Core i7 PC 2.5 /M.2 SSD BOXNUC7I7BNH PC DDR4-2133(PC ) 8GBX2 260pin 1.2V CL15 SP016GBSFU213B22 WD SSD M /51

リスト 1 1 <HTML> <HEAD> 3 <META http-equiv="content-type" content="text/html; charset=euc-jp"> 4 <TITLE> 住所の検索 </TITLE> 5 </HEAD> 6 <BODY> <FORM method=

日本内科学会雑誌第97巻第7号

BIT -2-

Windows XP Windows Me Windows 98 Second Edition Windows /... 25

untitled

D (1) JP MSZ-2100G FFA-PCW Fast Field Analyzer 2019 Sony Corporation

理解のための教材開発と授業 (宮内).PDF

Transcription:

CzeekS ver. 1.2

c 2017

ver. 1.2 Dragon7 shrink learn status

1 1 2 CzeekS 3 2.1................... 3 2.2................................... 4 2.3................................... 4 2.4 OpenBabel.................................. 5 3 6 3.1 CGBVS................................... 6 3.2....... 8 3.3................................... 10 3.4 Tanimoto........................ 11 4 CGBVS 13 4.1..................... 13 4.2 DB....................... 14 4.3..................................... 15 4.4....................................... 15 4.5........................................ 16 5 cgbvs 18

1 - ChEMBL CGBVS Chemical Genomics-Based Virtual ScreeningCzeekS CGBVS CGBVS MACCS 2 CzeekS 3 4 5 CzeekS CzeekS OpenMP CPU 1 CzeekS CPU 4 CPU (Intel, AMD) 16Gb HDD 10Gb OS CentOS5.x or 6.x 64bit (Linux 2.6) DRAGON6.0.38, Dragon7.0x OpenBabel 2.4.1 1 CPU Intel Xeon E5620 2 16 24Gb 20h 10m Intel Core i3 550 4 4Gb 66h 52m AMD Phenom X6 1055T 6 8Gb 70h 40m 1

1 CPU Ion Nuclear GPCR Kinase Channel Receptor Intel Xeon E5620 2 16 24GB Intel Core i7-4790 8 32GB Protease Transporter 10m 14s 20m 29s 7m 23s 4m 38s 10m 2s 3m 23s 9m 30s 18m 57s 7m 41s 5m 26s 9m 54s 3m 6s 2

2 CzeekS 2.1 CzeekS ******.tgz tar /usr/local czeeks /home/czeeks /home/czeeks $ tar xvzf CzeekS_ ******. tgz CGBVS / CGBVS / exec / CGBVS / exec / protein. lst CGBVS / exec /2 D_7_910_smi. drs CGBVS / exec / cgbvs CGBVS / exec / calc_dragon.sh CGBVS / exec /2 D_7_910_sdf. drs CGBVS / exec / SVMlearn CGBVS / exec / minfo czeeks license.dat /home/czeeks/cgbvs/exec CGBVS/ example H3 mols.csv H3 mols.fp H3 mols.sdf H3 mols.smi H3 positive.csv gpcr.csv positive.csv sample mols.csv sample mols.fp sample mols.sdf sample mols.smi training mols.csv training mols.fp training mols.sdf training mols.smi exec 2D 7 910 sdf.drs 2D 7 910 smi.drs 2D 894 sdf.drt 2D 894 smi.drt SVMlearn calc FP MACCS calc dragon.sh calc dragon7.sh cgbvs czeeks license.dat minfo protein.lst gpcr sample.db H3 H3 H3 SD H3 SMILES H3 GPCR SDF SMILES SD SMILES DRAGON7 DRAGON7 DRAGON6 DRAGON6 SVM MACCS DRAGON6 DRAGON7 CGBVS DB 3

2.2 1. CGBVS/exec minfo SHA1 2. $ cd CGBVS / exec $./ minfo a5866b20b7b4a1da0ac4406dcf7f40b963903c34 // 3. 2 a CGBVS czeeks license.dat b CGBVS LICENSE par 2 2.3.bashrc $ export CGBVS =/ home / czeeks / CGBVS / exec $ export PATH = $PATH : $CGBVS $ export LD_LIBRARY_PATH =/ usr / local / lib : $LD_LIBRARY_PATH $ export DRAGON6 =/ usr / local / bin // DRAGON6 $ export DRAGON7 =/ usr / local / bin // DRAGON7 DRAGON6/DRAGON7 DRAGON dragon6shell dragon7shell czeeks license.dat ${CGBVS} CGBVS LICENSE 4

2.4 OpenBabel CzeekS MACCS calc FP MACCS SD SMILES OpenBabel 1. cmake OpenBabel cmake cmake yum install cmake 2. OpenBabel OpenBabel GPL v2 URL http://openbabel.org/wiki/get Open Babel 2.4.1 tar openbabel-2.4.1 openbabel-2.4.1 $ mkdir build // $ cd build $ cmake../ // c m a k e $ make // OpenBabel $ su // # make install // CzeekS OpenBabel OpenBabel 5

3 3.1 CGBVS CzeekS CzeekS.db DB ChEMBL GPCR 4 CGBVS SVM SVM - - - CzeekS CGBVS 2 1 SVM - + 0 1 CzeekS CGBVS cgbvs status DB DB $ cgbvs status gpcr_ sample. db [ compound ] Dragon6 v.6.0.26 // # of data = 13838 // # of descriptors = 894 // [ protein ] PROFEAT 2011 // # of data = 859 // # of descriptors = 1080 // [ fingerprint ] 6

MACCS // # of data = 13838 // [ interactions ] # of positive interactions = 21747 // # of negative interactions = 0 // [ details of models ] # of sampled positive interactions = 21761 // id nsv dim C gamma 5- fold CV ------+---------+-------+---------+---------+----------- 1 32865 444 3. 0000 0. 0030 89. 3305 2 32954 444 3. 0000 0. 0030 89. 3708 3 33016 444 3. 0000 0. 0030 89. 4677 4 32912 444 3. 0000 0. 0030 89. 2075 5 32884 444 3. 0000 0. 0030 89. 4600 id id nsv C gamma SVM accuracy cgbvs status -p $ cgbvs status - p gpcr_ sample. db protein ID positive negative accession name 5 HT1A_ HUMAN 407 0 P08908 5- hydroxytryptamine receptor 1 A 5 HT1B_ HUMAN 207 0 P28222 5- hydroxytryptamine receptor 1 B 5 HT1D_ HUMAN 203 0 P28221 5- hydroxytryptamine receptor 1 D 5 HT1E_ HUMAN 74 0 P28566 5- hydroxytryptamine receptor 1 E 5 HT1F_ HUMAN 103 0 P30939 5- hydroxytryptamine receptor 1 F 5 HT2A_ HUMAN 388 0 P28223 5- hydroxytryptamine receptor 2 A 5 HT2B_ HUMAN 287 0 P41595 5- hydroxytryptamine receptor 2 B 5 HT2C_ HUMAN 422 0 P28335 5- hydroxytryptamine receptor 2 C 5 HT4R_ HUMAN 109 0 Q13639 5- hydroxytryptamine receptor 4 5 HT5A_ HUMAN 112 0 P47898 5- hydroxytryptamine receptor 5 A 5 HT6R_ HUMAN 252 0 P50406 5- hydroxytryptamine receptor 6 5 HT7R_ HUMAN 227 0 P34969 5- hydroxytryptamine receptor 7 A4_ HUMAN 100 0 P05067 Amyloid beta A4 protein AA1R_ HUMAN 117 0 P30542 Adenosine receptor A1 AA2AR_ HUMAN 123 0 P29274 Adenosine receptor A2a AA2BR_ HUMAN 107 0 P29275 Adenosine receptor A2b AA3R_ HUMAN 127 0 P33765 Adenosine receptor A3 protein ID ID ID accession UniProt(http://www.uniprot.org/) ID positive DB negative 7

3.2 SD DB CzeekS DRAGON6 exec SMILES DRAGON6 SMILES OpenBabel SD SMILES $ babel - isdf sample_ mols. sdf - osmi sample_ mols. smi // S M I L E S $ calc_ dragon. sh sample_ mols. smi > output. csv $ cat output. csv ZINC00074638,315.320,8.522,24.952,38.109,25.091, ZINC00075927,269.300,8.416,21.796,32.563,22.216, ZINC00492910,300.390,7.152,25.928,42.138,27.228, ZINC02759964,339.170,10.941,21.362,32.153,21.784, ZINC03518134,264.360,6.778,22.928,39.138,24.228, // C S V ID, 1, 2, 1 1 ID calc dragon.sh cgbvs predict CzeekS sample mols.csv β2 $ cgbvs predict gpcr_ sample. db ADRB2_ HUMAN sample_ mols. csv compound ADRB2_ HUMAN ZINC00074638 0. 30964167 ZINC00075927 0. 08384572 ZINC00492910 0. 97130469 ZINC02759964 0. 11692792 8

ZINC03518134 0. 48137199 ZINC03912658 0. 16544143 ZINC04143221 0. 17974889 2 CGBVS DB 3 ID 4 3 ID cgbvs status p 3 ID ID 1 2 $ cgbvs predict gpcr_ sample. db ADRB1_ HUMAN, ADRB2_ HUMAN sample_ mols. csv compound ADRB1_ HUMAN ADRB2_ HUMAN ZINC00074638 0. 02890300 0. 30964167 ZINC00075927 0. 05518164 0. 08384572 ZINC00492910 0. 94315208 0. 97130469 ZINC02759964 0. 09213196 0. 11692792 ZINC03518134 0. 24245863 0. 48137199 ZINC03912658 0. 16195949 0. 16544143 ZINC04143221 0. 14475844 0. 17974889 % $ cgbvs predict gpcr_ sample. db ADA %, ADR % sample_ mols. csv compound ADA1A_ HUMAN ADA1B_ HUMAN ADA1D_ HUMAN ADA2A_ HUMAN ADA2B_ HUMAN ADA2C_ HUMAN ADRB1_ HUMAN ADRB2_ HUMAN ADRB3_ HUMAN ZINC00074638 0. 00546713 0. 00790653 0. 01368746 0. 04825282 0. 01539659 0. 01710232 0. 02890300 0. 30964167 0. 02416605 ZINC00075927 0. 04435283 0. 05292626 0. 03401368 0. 12980506 0. 11023397 0. 08800234 0. 05518164 0. 08384572 0. 05904665 ZINC00492910 0. 82906031 0. 66281280 0. 57664539 0. 28904697 0. 36205274 0. 15184621 0. 94315208 0. 97130469 0. 95462775 cgbvs predict CGBVS -d 9

SVM $ cgbvs predict - d gpcr_ sample. db ADR % sample_ mols. csv compound ADRB1_ HUMAN ADRB2_ HUMAN ADRB3_ HUMAN ZINC00074638-0. 85256756-0. 22087194-0. 92119760 ZINC00075927-0. 68532704-0. 59014300-0. 67134752 ZINC00492910 0. 68151956 0. 86129199 0. 72860470 ZINC02759964-0. 55438422-0. 50374617-0. 55658830 ZINC03518134-0. 33148519-0. 03063260-0. 31769752 ZINC03912658-0. 40012509-0. 39415553-0. 41061824 ZINC04143221-0. 44055998-0. 37664852-0. 73233241 -v $ cgbvs predict - v gpcr_ sample. db ADR % sample_ mols. csv compound protein probability score ZINC00074638 ADRB1_ HUMAN 0. 02890300-0. 85256756 ZINC00074638 ADRB2_ HUMAN 0. 30964167-0. 22087194 ZINC00074638 ADRB3_ HUMAN 0. 02416605-0. 92119760 ZINC00075927 ADRB1_ HUMAN 0. 05518164-0. 68532704 ZINC00075927 ADRB2_ HUMAN 0. 08384572-0. 59014300 ZINC00075927 ADRB3_ HUMAN 0. 05904665-0. 67134752 ZINC00492910 ADRB1_ HUMAN 0. 94315208 0. 68151956 ZINC00492910 ADRB2_ HUMAN 0. 97130469 0. 86129199 ZINC00492910 ADRB3_ HUMAN 0. 95462775 0. 72860470 1-2 3.3 CGBVS CGBVS cgbvs predict all DB 1 -a cgbvs status -p sample mols.csv ZINC10454282 ID $ grep ZINC10454282 sample_ mols. csv > test. csv $ cgbvs predict - v gpcr_ sample. db all test. csv 10

compound protein probability score ZINC10454282 5 HT1A_ HUMAN 0. 10425755-0. 57338188 ZINC10454282 5 HT1B_ HUMAN 0. 05597695-0. 71609958 ZINC10454282 5 HT1D_ HUMAN 0. 07338144-0. 71064291 ZINC10454282 5 HT1E_ HUMAN 0. 68686311 0. 24373705 ZINC10454282 5 HT1F_ HUMAN 0. 07601352-0. 65082618 ZINC10454282 5 HT2A_ HUMAN 0. 11784583-0. 60281690 ZINC10454282 5 HT2B_ HUMAN 0. 32152267-0. 26499771 ZINC10454282 5 HT2C_ HUMAN 0. 07943595-0. 65445856 ZINC10454282 5 HT4R_ HUMAN 0. 12747822-0. 51000689 ZINC10454282 5 HT5A_ HUMAN 0. 21434369-0. 38699666 ZINC10454282 5 HT6R_ HUMAN 0. 16279751-0. 44377240 ZINC10454282 5 HT7R_ HUMAN 0. 02697416-0. 88964447 ZINC10454282 A4_ HUMAN 0. 24993452-0. 28091388 ZINC10454282 AA1R_ HUMAN 0. 11095267-0. 53572504 -v ID $ cgbvs predict - v gpcr_ sample. db all test. csv > out $ sort - k3 - nr out head ZINC10454282 MTR1A_ HUMAN 0. 92231982 0. 62125636 ZINC10454282 TSHR_ HUMAN 0. 90106276 0. 61032948 ZINC10454282 GRM2_ HUMAN 0. 81718024 0. 35825295 ZINC10454282 MTR1B_ HUMAN 0. 81103861 0. 34695291 ZINC10454282 HRH3_ HUMAN 0. 75912780 0. 28435030 ZINC10454282 5 HT1E_ HUMAN 0. 68686311 0. 24373705 ZINC10454282 CCR6_ HUMAN 0. 66715804 0. 16679803 ZINC10454282 NPY2R_ HUMAN 0. 58703833 0. 09239658 ZINC10454282 GRM5_ HUMAN 0. 57349212 0. 05438964 ZINC10454282 ARBK1_ HUMAN 0. 55180198 0. 03925507 2 ID MTR1A HUMAN MTR1B HUMAN $ cgbvs status - p gpcr_ sample. db grep - e " MTR1..*" MTR1A_ HUMAN 102 0 P48039 Melatonin receptor type 1 A MTR1B_ HUMAN 101 0 P49286 Melatonin receptor type 1 B 3.4 Tanimoto CzeekS Tanimoto Similarity Tanimoto DB Tanimoto cgbvs predict -s 11

$ calc_fp_maccs sample_mols. sdf test.fp // test. fp sample mols. f p $ cgbvs predict - s gpcr_ sample. db ADRB2_ HUMAN test. fp compound ADRB2_ HUMAN ZINC00074638 0. 55737705 ZINC00075927 0. 48571429 ZINC00492910 0. 71428571 ZINC02759964 0. 58108108 ZINC03518134 0. 56666667 ZINC03912658 0. 72000000 ZINC04143221 0. 72972973 ZINC05766699 0. 54385965 ZINC10006603 0. 71641791 test.fp $ head sample_ mols. fp ZINC00074638,42 50 57 62 72 75 76 83 85 87 89 91 92 95 ZINC00075927,41 42 52 65 75 78 80 87 92 94 95 97 98 107 110 ZINC00492910,54 72 82 90 92 95 97 100 104 109 110 113 117 126 ZINC02759964,24 46 49 52 56 63 65 70 71 75 79 80 83 87 92 93 ZINC03518134,65 72 75 83 85 90 91 92 93 95 96 104 110 111 117 1 ID 2 1 n 1 n 12

4 CGBVS 4.1 CGBVS 3 1. 2. 3. - 3 CSV training mols.csv $ head training_ mols. csv 250,377.470,8.778,30.037,43.407,31.387,47.870,0.699,1.009,0.730, 158482,637.850,7.087,55.690,89.934,58.768,101.427,0.619,0.999,0.653, 163503,355.510,6.837,33.058,51.032,35.597,57.690,0.636,0.981,0.685, 166739,416.560,7.439,37.354,55.243,39.637,62.145,0.667,0.986,0.708, 159447,359.530,6.537,31.858,54.767,34.074,62.893,0.579,0.996,0.620, 7139,400.930,8.019,33.853,49.691,35.938,55.302,0.677,0.994,0.719, 158073,255.730,8.249,19.482,31.585,20.352,35.349,0.628,1.019,0.657, 162130,560.720,8.761,43.782,64.774,45.995,71.368,0.684,1.012,0.719, 159704,340.840,8.313,27.316,41.243,28.665,46.053,0.666,1.006,0.699, 159533,359.530,6.537,31.858,54.767,34.074,62.893,0.579,0.996,0.620, 3 1 ID 2 training mols.smi DRAGON6 gpcr.csv $ head gpcr. csv 5 HT1A_HUMAN,9.71564,3.31754,3.79147,3.5545,4.02844, 5 HT1B_HUMAN,8.97436,2.82051,3.58974,3.33333,4.35897, 5 HT1D_HUMAN,9.81432,2.91777,2.65252,3.18302,4.50928, 5 HT1E_HUMAN,6.57534,3.28767,3.56164,3.28767,4.65753, 5 HT1F_HUMAN,6.28415,3.00546,4.09836,4.64481,4.37158, 5 HT2A_HUMAN,6.15711,3.18471,4.24628,3.82166,5.30786, 5 HT2B_HUMAN,6.02911,1.6632,2.9106,4.3659,5.40541, 5 HT2C_HUMAN,5.8952,2.62009,2.83843,4.80349,4.58515, 5 HT4R_HUMAN,6.95876,4.63917,3.86598,3.09278,5.6701, 5 HT5A_HUMAN,7.84314,2.80112,2.52101,3.92157,6.16247, 13

PROFEAT http://bidd2.nus.edu.sg/cgi-bin/profeat2016/protein/profnew.cgi FASTA PRO- FEAT CzeekS ID UniProt ID * HUMAN UniProtID positive.csv $ head positive. csv 1000029, NPBW1_ HUMAN 1000123, ARBK1_ HUMAN 100014, CRFR1_ HUMAN 1000194, FAK2_ HUMAN 1000948, CCR6_ HUMAN 1000956, NTR1_ HUMAN 1001098, FAK2_ HUMAN 1001421, OX1R_ HUMAN 100163, PTAFR_ HUMAN 1001651, ADRB2_ HUMAN 1 ID 2 ID - ChEMBL - 30µM 4.2 DB 3 CGBVS DB training mols.csv, gpcr.csv, positive.csv $ cgbvs create training.db // D B $ cgbvs import training.db training_mols. csv compound // import training_ mols. csv $ cgbvs import training.db gpcr. csv protein // import gpcr. csv $ cgbvs import training.db positive. csv positive // import positive. csv DB DB gbvs create DB SVM CGBVS 4.4 14

3.4 CzeekS DB Tanimoto $ cgbvs import training. db training_ mols. fp fingerprint import training_ mols. fp MACCS 3-4 4.3 DB CGBVS 4-1 3 gbvs status -p -a 0 3-1 cgbvs add H3 100 H3 mols.sdf H3 mols.csv H3 positive.csv $ cgbvs add training.db H3_mols. csv compound // import H3_ mols. csv $ cgbvs add training.db H3_positive. csv positive // import H3_ positive. csv 4.4 DB SVM cgbvs learn $ cgbvs learn -C 99% -P 99% -n -f training.db 5 output input_ 1 output input_ 2 output input_ 3 output input_ 4 output input_ 5 $ SVMlearn - c 3 - g 0. 003 input_ 1 model_ 1 15

itr nsv vkkt Objective 1 965 42306-4. 237480732168150 E +02 2 1724 41252-8. 371539432307425 E +02 3 2457 42662-1. 576994435268771 E +03 5 5 10 3-1 -c -g SVM -c SVM C CzeekS SVM RBF(Radial Basis Function) -g RBF C=3 γ=0.003 SVM C 4.5 4-4 5 5 -f SVM $ cgbvs learn - f training. db 5 output input_ 1 output input_ 2 output input_ 3 output input_ 4 output input_ 5 SVM $ SVMlearn -c 3 -g 0.003 input_1 model_1 // 1 $ SVMlearn -c 3 -g 0.003 input_2 model_2 // 2 $ SVMlearn -c 3 -g 0.003 input_3 model_3 // 3 $ SVMlearn -c 3 -g 0.003 input_4 model_4 // 4 $ SVMlearn -c 3 -g 0.003 input_5 model_5 // 5 model 1 model 5 5 DB $ cgbvs add_model training.db model_1 1 // model 1 id=1 $ cgbvs add_model training.db model_2 2 // model 2 id=2 $ cgbvs add_model training.db model_3 3 // model 3 id=3 $ cgbvs add_model training.db model_4 4 // model 4 id=4 16

$ cgbvs add_model training.db model_5 5 // model 5 id=5 cgbvs status SVM input 1 #!/ bin /sh for c in 1 3 10 30 100; do for g in 0. 001 0. 003 0.01 0.03 0.1; do echo -ne $c "\t"$g "\t" SVMlearn - c $c - g $g input_ 1 model_ 1 grep cross - validation awk { print $6 } done done SVM C=1, 3, 10, 30, 100 5 γ=0.001, 0.003, 0.01, 0.03, 0.1 5 C C model 1 model 5 DB 17

5 cgbvs cgbvs < > [< >] < > add, add model, comment, create, delete, del model, import, learn, predict, status, shrink add : cgbvs add <db > < > < > CSV db < > compound protein positive negative fingerprint add model : SVM cgbvs add model <db > < > <ID > SVM ID db ID ID SVMlearn -l libsvm svm-train 18

-l :libsvm comment : cgbvs comment <db > < > < > <db > db < > compound protein positive negative fingerprint create : db cgbvs create [ ] <db > db db import -c <arg> <arg> -p <arg> <arg> -i <arg> <arg> -n <arg> <arg> -f <arg> <arg> <arg> CSV delete : cgbvs delete <db > < > 19

<db > db < > compound protein positive negative fingerprint del model : SVM db cgbvs del model <db > < ID> <db > db < ID> SVM cgbvs status < ID> all SVM import : db cgbvs import <db > < > < > CSV db < > compound protein positive negative fingerprint add db < > db import -m <arg> <arg> 20

learn : cgbvs learn [ ] <db > < > db SVM db SVM SVM < > SVM db -c <arg> SVM C 10 -g <arg> RBF 0.01 -v <arg> 5 -s <arg> 1 -C <arg> -P <arg> 2 <arg> <arg> % -n -r 2 SVM -f SVMlearn -l LIBSVM predict : cgbvs predict [ ] <db > < ID> < > <db > CGBVS < ID> < > 21

< ID> % all db ID status -p -a -s Tanimoto -d SVM -v -n <arg> <arg> ID status : db cgbvs status [ ] <db > db -c ID -p ID -a ID -p 1 -a predict ID ID shrink : db cgbvs shrink <db > db db 22