CzeekS ver. 1.2
c 2017
ver. 1.2 Dragon7 shrink learn status
1 1 2 CzeekS 3 2.1................... 3 2.2................................... 4 2.3................................... 4 2.4 OpenBabel.................................. 5 3 6 3.1 CGBVS................................... 6 3.2....... 8 3.3................................... 10 3.4 Tanimoto........................ 11 4 CGBVS 13 4.1..................... 13 4.2 DB....................... 14 4.3..................................... 15 4.4....................................... 15 4.5........................................ 16 5 cgbvs 18
1 - ChEMBL CGBVS Chemical Genomics-Based Virtual ScreeningCzeekS CGBVS CGBVS MACCS 2 CzeekS 3 4 5 CzeekS CzeekS OpenMP CPU 1 CzeekS CPU 4 CPU (Intel, AMD) 16Gb HDD 10Gb OS CentOS5.x or 6.x 64bit (Linux 2.6) DRAGON6.0.38, Dragon7.0x OpenBabel 2.4.1 1 CPU Intel Xeon E5620 2 16 24Gb 20h 10m Intel Core i3 550 4 4Gb 66h 52m AMD Phenom X6 1055T 6 8Gb 70h 40m 1
1 CPU Ion Nuclear GPCR Kinase Channel Receptor Intel Xeon E5620 2 16 24GB Intel Core i7-4790 8 32GB Protease Transporter 10m 14s 20m 29s 7m 23s 4m 38s 10m 2s 3m 23s 9m 30s 18m 57s 7m 41s 5m 26s 9m 54s 3m 6s 2
2 CzeekS 2.1 CzeekS ******.tgz tar /usr/local czeeks /home/czeeks /home/czeeks $ tar xvzf CzeekS_ ******. tgz CGBVS / CGBVS / exec / CGBVS / exec / protein. lst CGBVS / exec /2 D_7_910_smi. drs CGBVS / exec / cgbvs CGBVS / exec / calc_dragon.sh CGBVS / exec /2 D_7_910_sdf. drs CGBVS / exec / SVMlearn CGBVS / exec / minfo czeeks license.dat /home/czeeks/cgbvs/exec CGBVS/ example H3 mols.csv H3 mols.fp H3 mols.sdf H3 mols.smi H3 positive.csv gpcr.csv positive.csv sample mols.csv sample mols.fp sample mols.sdf sample mols.smi training mols.csv training mols.fp training mols.sdf training mols.smi exec 2D 7 910 sdf.drs 2D 7 910 smi.drs 2D 894 sdf.drt 2D 894 smi.drt SVMlearn calc FP MACCS calc dragon.sh calc dragon7.sh cgbvs czeeks license.dat minfo protein.lst gpcr sample.db H3 H3 H3 SD H3 SMILES H3 GPCR SDF SMILES SD SMILES DRAGON7 DRAGON7 DRAGON6 DRAGON6 SVM MACCS DRAGON6 DRAGON7 CGBVS DB 3
2.2 1. CGBVS/exec minfo SHA1 2. $ cd CGBVS / exec $./ minfo a5866b20b7b4a1da0ac4406dcf7f40b963903c34 // 3. 2 a CGBVS czeeks license.dat b CGBVS LICENSE par 2 2.3.bashrc $ export CGBVS =/ home / czeeks / CGBVS / exec $ export PATH = $PATH : $CGBVS $ export LD_LIBRARY_PATH =/ usr / local / lib : $LD_LIBRARY_PATH $ export DRAGON6 =/ usr / local / bin // DRAGON6 $ export DRAGON7 =/ usr / local / bin // DRAGON7 DRAGON6/DRAGON7 DRAGON dragon6shell dragon7shell czeeks license.dat ${CGBVS} CGBVS LICENSE 4
2.4 OpenBabel CzeekS MACCS calc FP MACCS SD SMILES OpenBabel 1. cmake OpenBabel cmake cmake yum install cmake 2. OpenBabel OpenBabel GPL v2 URL http://openbabel.org/wiki/get Open Babel 2.4.1 tar openbabel-2.4.1 openbabel-2.4.1 $ mkdir build // $ cd build $ cmake../ // c m a k e $ make // OpenBabel $ su // # make install // CzeekS OpenBabel OpenBabel 5
3 3.1 CGBVS CzeekS CzeekS.db DB ChEMBL GPCR 4 CGBVS SVM SVM - - - CzeekS CGBVS 2 1 SVM - + 0 1 CzeekS CGBVS cgbvs status DB DB $ cgbvs status gpcr_ sample. db [ compound ] Dragon6 v.6.0.26 // # of data = 13838 // # of descriptors = 894 // [ protein ] PROFEAT 2011 // # of data = 859 // # of descriptors = 1080 // [ fingerprint ] 6
MACCS // # of data = 13838 // [ interactions ] # of positive interactions = 21747 // # of negative interactions = 0 // [ details of models ] # of sampled positive interactions = 21761 // id nsv dim C gamma 5- fold CV ------+---------+-------+---------+---------+----------- 1 32865 444 3. 0000 0. 0030 89. 3305 2 32954 444 3. 0000 0. 0030 89. 3708 3 33016 444 3. 0000 0. 0030 89. 4677 4 32912 444 3. 0000 0. 0030 89. 2075 5 32884 444 3. 0000 0. 0030 89. 4600 id id nsv C gamma SVM accuracy cgbvs status -p $ cgbvs status - p gpcr_ sample. db protein ID positive negative accession name 5 HT1A_ HUMAN 407 0 P08908 5- hydroxytryptamine receptor 1 A 5 HT1B_ HUMAN 207 0 P28222 5- hydroxytryptamine receptor 1 B 5 HT1D_ HUMAN 203 0 P28221 5- hydroxytryptamine receptor 1 D 5 HT1E_ HUMAN 74 0 P28566 5- hydroxytryptamine receptor 1 E 5 HT1F_ HUMAN 103 0 P30939 5- hydroxytryptamine receptor 1 F 5 HT2A_ HUMAN 388 0 P28223 5- hydroxytryptamine receptor 2 A 5 HT2B_ HUMAN 287 0 P41595 5- hydroxytryptamine receptor 2 B 5 HT2C_ HUMAN 422 0 P28335 5- hydroxytryptamine receptor 2 C 5 HT4R_ HUMAN 109 0 Q13639 5- hydroxytryptamine receptor 4 5 HT5A_ HUMAN 112 0 P47898 5- hydroxytryptamine receptor 5 A 5 HT6R_ HUMAN 252 0 P50406 5- hydroxytryptamine receptor 6 5 HT7R_ HUMAN 227 0 P34969 5- hydroxytryptamine receptor 7 A4_ HUMAN 100 0 P05067 Amyloid beta A4 protein AA1R_ HUMAN 117 0 P30542 Adenosine receptor A1 AA2AR_ HUMAN 123 0 P29274 Adenosine receptor A2a AA2BR_ HUMAN 107 0 P29275 Adenosine receptor A2b AA3R_ HUMAN 127 0 P33765 Adenosine receptor A3 protein ID ID ID accession UniProt(http://www.uniprot.org/) ID positive DB negative 7
3.2 SD DB CzeekS DRAGON6 exec SMILES DRAGON6 SMILES OpenBabel SD SMILES $ babel - isdf sample_ mols. sdf - osmi sample_ mols. smi // S M I L E S $ calc_ dragon. sh sample_ mols. smi > output. csv $ cat output. csv ZINC00074638,315.320,8.522,24.952,38.109,25.091, ZINC00075927,269.300,8.416,21.796,32.563,22.216, ZINC00492910,300.390,7.152,25.928,42.138,27.228, ZINC02759964,339.170,10.941,21.362,32.153,21.784, ZINC03518134,264.360,6.778,22.928,39.138,24.228, // C S V ID, 1, 2, 1 1 ID calc dragon.sh cgbvs predict CzeekS sample mols.csv β2 $ cgbvs predict gpcr_ sample. db ADRB2_ HUMAN sample_ mols. csv compound ADRB2_ HUMAN ZINC00074638 0. 30964167 ZINC00075927 0. 08384572 ZINC00492910 0. 97130469 ZINC02759964 0. 11692792 8
ZINC03518134 0. 48137199 ZINC03912658 0. 16544143 ZINC04143221 0. 17974889 2 CGBVS DB 3 ID 4 3 ID cgbvs status p 3 ID ID 1 2 $ cgbvs predict gpcr_ sample. db ADRB1_ HUMAN, ADRB2_ HUMAN sample_ mols. csv compound ADRB1_ HUMAN ADRB2_ HUMAN ZINC00074638 0. 02890300 0. 30964167 ZINC00075927 0. 05518164 0. 08384572 ZINC00492910 0. 94315208 0. 97130469 ZINC02759964 0. 09213196 0. 11692792 ZINC03518134 0. 24245863 0. 48137199 ZINC03912658 0. 16195949 0. 16544143 ZINC04143221 0. 14475844 0. 17974889 % $ cgbvs predict gpcr_ sample. db ADA %, ADR % sample_ mols. csv compound ADA1A_ HUMAN ADA1B_ HUMAN ADA1D_ HUMAN ADA2A_ HUMAN ADA2B_ HUMAN ADA2C_ HUMAN ADRB1_ HUMAN ADRB2_ HUMAN ADRB3_ HUMAN ZINC00074638 0. 00546713 0. 00790653 0. 01368746 0. 04825282 0. 01539659 0. 01710232 0. 02890300 0. 30964167 0. 02416605 ZINC00075927 0. 04435283 0. 05292626 0. 03401368 0. 12980506 0. 11023397 0. 08800234 0. 05518164 0. 08384572 0. 05904665 ZINC00492910 0. 82906031 0. 66281280 0. 57664539 0. 28904697 0. 36205274 0. 15184621 0. 94315208 0. 97130469 0. 95462775 cgbvs predict CGBVS -d 9
SVM $ cgbvs predict - d gpcr_ sample. db ADR % sample_ mols. csv compound ADRB1_ HUMAN ADRB2_ HUMAN ADRB3_ HUMAN ZINC00074638-0. 85256756-0. 22087194-0. 92119760 ZINC00075927-0. 68532704-0. 59014300-0. 67134752 ZINC00492910 0. 68151956 0. 86129199 0. 72860470 ZINC02759964-0. 55438422-0. 50374617-0. 55658830 ZINC03518134-0. 33148519-0. 03063260-0. 31769752 ZINC03912658-0. 40012509-0. 39415553-0. 41061824 ZINC04143221-0. 44055998-0. 37664852-0. 73233241 -v $ cgbvs predict - v gpcr_ sample. db ADR % sample_ mols. csv compound protein probability score ZINC00074638 ADRB1_ HUMAN 0. 02890300-0. 85256756 ZINC00074638 ADRB2_ HUMAN 0. 30964167-0. 22087194 ZINC00074638 ADRB3_ HUMAN 0. 02416605-0. 92119760 ZINC00075927 ADRB1_ HUMAN 0. 05518164-0. 68532704 ZINC00075927 ADRB2_ HUMAN 0. 08384572-0. 59014300 ZINC00075927 ADRB3_ HUMAN 0. 05904665-0. 67134752 ZINC00492910 ADRB1_ HUMAN 0. 94315208 0. 68151956 ZINC00492910 ADRB2_ HUMAN 0. 97130469 0. 86129199 ZINC00492910 ADRB3_ HUMAN 0. 95462775 0. 72860470 1-2 3.3 CGBVS CGBVS cgbvs predict all DB 1 -a cgbvs status -p sample mols.csv ZINC10454282 ID $ grep ZINC10454282 sample_ mols. csv > test. csv $ cgbvs predict - v gpcr_ sample. db all test. csv 10
compound protein probability score ZINC10454282 5 HT1A_ HUMAN 0. 10425755-0. 57338188 ZINC10454282 5 HT1B_ HUMAN 0. 05597695-0. 71609958 ZINC10454282 5 HT1D_ HUMAN 0. 07338144-0. 71064291 ZINC10454282 5 HT1E_ HUMAN 0. 68686311 0. 24373705 ZINC10454282 5 HT1F_ HUMAN 0. 07601352-0. 65082618 ZINC10454282 5 HT2A_ HUMAN 0. 11784583-0. 60281690 ZINC10454282 5 HT2B_ HUMAN 0. 32152267-0. 26499771 ZINC10454282 5 HT2C_ HUMAN 0. 07943595-0. 65445856 ZINC10454282 5 HT4R_ HUMAN 0. 12747822-0. 51000689 ZINC10454282 5 HT5A_ HUMAN 0. 21434369-0. 38699666 ZINC10454282 5 HT6R_ HUMAN 0. 16279751-0. 44377240 ZINC10454282 5 HT7R_ HUMAN 0. 02697416-0. 88964447 ZINC10454282 A4_ HUMAN 0. 24993452-0. 28091388 ZINC10454282 AA1R_ HUMAN 0. 11095267-0. 53572504 -v ID $ cgbvs predict - v gpcr_ sample. db all test. csv > out $ sort - k3 - nr out head ZINC10454282 MTR1A_ HUMAN 0. 92231982 0. 62125636 ZINC10454282 TSHR_ HUMAN 0. 90106276 0. 61032948 ZINC10454282 GRM2_ HUMAN 0. 81718024 0. 35825295 ZINC10454282 MTR1B_ HUMAN 0. 81103861 0. 34695291 ZINC10454282 HRH3_ HUMAN 0. 75912780 0. 28435030 ZINC10454282 5 HT1E_ HUMAN 0. 68686311 0. 24373705 ZINC10454282 CCR6_ HUMAN 0. 66715804 0. 16679803 ZINC10454282 NPY2R_ HUMAN 0. 58703833 0. 09239658 ZINC10454282 GRM5_ HUMAN 0. 57349212 0. 05438964 ZINC10454282 ARBK1_ HUMAN 0. 55180198 0. 03925507 2 ID MTR1A HUMAN MTR1B HUMAN $ cgbvs status - p gpcr_ sample. db grep - e " MTR1..*" MTR1A_ HUMAN 102 0 P48039 Melatonin receptor type 1 A MTR1B_ HUMAN 101 0 P49286 Melatonin receptor type 1 B 3.4 Tanimoto CzeekS Tanimoto Similarity Tanimoto DB Tanimoto cgbvs predict -s 11
$ calc_fp_maccs sample_mols. sdf test.fp // test. fp sample mols. f p $ cgbvs predict - s gpcr_ sample. db ADRB2_ HUMAN test. fp compound ADRB2_ HUMAN ZINC00074638 0. 55737705 ZINC00075927 0. 48571429 ZINC00492910 0. 71428571 ZINC02759964 0. 58108108 ZINC03518134 0. 56666667 ZINC03912658 0. 72000000 ZINC04143221 0. 72972973 ZINC05766699 0. 54385965 ZINC10006603 0. 71641791 test.fp $ head sample_ mols. fp ZINC00074638,42 50 57 62 72 75 76 83 85 87 89 91 92 95 ZINC00075927,41 42 52 65 75 78 80 87 92 94 95 97 98 107 110 ZINC00492910,54 72 82 90 92 95 97 100 104 109 110 113 117 126 ZINC02759964,24 46 49 52 56 63 65 70 71 75 79 80 83 87 92 93 ZINC03518134,65 72 75 83 85 90 91 92 93 95 96 104 110 111 117 1 ID 2 1 n 1 n 12
4 CGBVS 4.1 CGBVS 3 1. 2. 3. - 3 CSV training mols.csv $ head training_ mols. csv 250,377.470,8.778,30.037,43.407,31.387,47.870,0.699,1.009,0.730, 158482,637.850,7.087,55.690,89.934,58.768,101.427,0.619,0.999,0.653, 163503,355.510,6.837,33.058,51.032,35.597,57.690,0.636,0.981,0.685, 166739,416.560,7.439,37.354,55.243,39.637,62.145,0.667,0.986,0.708, 159447,359.530,6.537,31.858,54.767,34.074,62.893,0.579,0.996,0.620, 7139,400.930,8.019,33.853,49.691,35.938,55.302,0.677,0.994,0.719, 158073,255.730,8.249,19.482,31.585,20.352,35.349,0.628,1.019,0.657, 162130,560.720,8.761,43.782,64.774,45.995,71.368,0.684,1.012,0.719, 159704,340.840,8.313,27.316,41.243,28.665,46.053,0.666,1.006,0.699, 159533,359.530,6.537,31.858,54.767,34.074,62.893,0.579,0.996,0.620, 3 1 ID 2 training mols.smi DRAGON6 gpcr.csv $ head gpcr. csv 5 HT1A_HUMAN,9.71564,3.31754,3.79147,3.5545,4.02844, 5 HT1B_HUMAN,8.97436,2.82051,3.58974,3.33333,4.35897, 5 HT1D_HUMAN,9.81432,2.91777,2.65252,3.18302,4.50928, 5 HT1E_HUMAN,6.57534,3.28767,3.56164,3.28767,4.65753, 5 HT1F_HUMAN,6.28415,3.00546,4.09836,4.64481,4.37158, 5 HT2A_HUMAN,6.15711,3.18471,4.24628,3.82166,5.30786, 5 HT2B_HUMAN,6.02911,1.6632,2.9106,4.3659,5.40541, 5 HT2C_HUMAN,5.8952,2.62009,2.83843,4.80349,4.58515, 5 HT4R_HUMAN,6.95876,4.63917,3.86598,3.09278,5.6701, 5 HT5A_HUMAN,7.84314,2.80112,2.52101,3.92157,6.16247, 13
PROFEAT http://bidd2.nus.edu.sg/cgi-bin/profeat2016/protein/profnew.cgi FASTA PRO- FEAT CzeekS ID UniProt ID * HUMAN UniProtID positive.csv $ head positive. csv 1000029, NPBW1_ HUMAN 1000123, ARBK1_ HUMAN 100014, CRFR1_ HUMAN 1000194, FAK2_ HUMAN 1000948, CCR6_ HUMAN 1000956, NTR1_ HUMAN 1001098, FAK2_ HUMAN 1001421, OX1R_ HUMAN 100163, PTAFR_ HUMAN 1001651, ADRB2_ HUMAN 1 ID 2 ID - ChEMBL - 30µM 4.2 DB 3 CGBVS DB training mols.csv, gpcr.csv, positive.csv $ cgbvs create training.db // D B $ cgbvs import training.db training_mols. csv compound // import training_ mols. csv $ cgbvs import training.db gpcr. csv protein // import gpcr. csv $ cgbvs import training.db positive. csv positive // import positive. csv DB DB gbvs create DB SVM CGBVS 4.4 14
3.4 CzeekS DB Tanimoto $ cgbvs import training. db training_ mols. fp fingerprint import training_ mols. fp MACCS 3-4 4.3 DB CGBVS 4-1 3 gbvs status -p -a 0 3-1 cgbvs add H3 100 H3 mols.sdf H3 mols.csv H3 positive.csv $ cgbvs add training.db H3_mols. csv compound // import H3_ mols. csv $ cgbvs add training.db H3_positive. csv positive // import H3_ positive. csv 4.4 DB SVM cgbvs learn $ cgbvs learn -C 99% -P 99% -n -f training.db 5 output input_ 1 output input_ 2 output input_ 3 output input_ 4 output input_ 5 $ SVMlearn - c 3 - g 0. 003 input_ 1 model_ 1 15
itr nsv vkkt Objective 1 965 42306-4. 237480732168150 E +02 2 1724 41252-8. 371539432307425 E +02 3 2457 42662-1. 576994435268771 E +03 5 5 10 3-1 -c -g SVM -c SVM C CzeekS SVM RBF(Radial Basis Function) -g RBF C=3 γ=0.003 SVM C 4.5 4-4 5 5 -f SVM $ cgbvs learn - f training. db 5 output input_ 1 output input_ 2 output input_ 3 output input_ 4 output input_ 5 SVM $ SVMlearn -c 3 -g 0.003 input_1 model_1 // 1 $ SVMlearn -c 3 -g 0.003 input_2 model_2 // 2 $ SVMlearn -c 3 -g 0.003 input_3 model_3 // 3 $ SVMlearn -c 3 -g 0.003 input_4 model_4 // 4 $ SVMlearn -c 3 -g 0.003 input_5 model_5 // 5 model 1 model 5 5 DB $ cgbvs add_model training.db model_1 1 // model 1 id=1 $ cgbvs add_model training.db model_2 2 // model 2 id=2 $ cgbvs add_model training.db model_3 3 // model 3 id=3 $ cgbvs add_model training.db model_4 4 // model 4 id=4 16
$ cgbvs add_model training.db model_5 5 // model 5 id=5 cgbvs status SVM input 1 #!/ bin /sh for c in 1 3 10 30 100; do for g in 0. 001 0. 003 0.01 0.03 0.1; do echo -ne $c "\t"$g "\t" SVMlearn - c $c - g $g input_ 1 model_ 1 grep cross - validation awk { print $6 } done done SVM C=1, 3, 10, 30, 100 5 γ=0.001, 0.003, 0.01, 0.03, 0.1 5 C C model 1 model 5 DB 17
5 cgbvs cgbvs < > [< >] < > add, add model, comment, create, delete, del model, import, learn, predict, status, shrink add : cgbvs add <db > < > < > CSV db < > compound protein positive negative fingerprint add model : SVM cgbvs add model <db > < > <ID > SVM ID db ID ID SVMlearn -l libsvm svm-train 18
-l :libsvm comment : cgbvs comment <db > < > < > <db > db < > compound protein positive negative fingerprint create : db cgbvs create [ ] <db > db db import -c <arg> <arg> -p <arg> <arg> -i <arg> <arg> -n <arg> <arg> -f <arg> <arg> <arg> CSV delete : cgbvs delete <db > < > 19
<db > db < > compound protein positive negative fingerprint del model : SVM db cgbvs del model <db > < ID> <db > db < ID> SVM cgbvs status < ID> all SVM import : db cgbvs import <db > < > < > CSV db < > compound protein positive negative fingerprint add db < > db import -m <arg> <arg> 20
learn : cgbvs learn [ ] <db > < > db SVM db SVM SVM < > SVM db -c <arg> SVM C 10 -g <arg> RBF 0.01 -v <arg> 5 -s <arg> 1 -C <arg> -P <arg> 2 <arg> <arg> % -n -r 2 SVM -f SVMlearn -l LIBSVM predict : cgbvs predict [ ] <db > < ID> < > <db > CGBVS < ID> < > 21
< ID> % all db ID status -p -a -s Tanimoto -d SVM -v -n <arg> <arg> ID status : db cgbvs status [ ] <db > db -c ID -p ID -a ID -p 1 -a predict ID ID shrink : db cgbvs shrink <db > db db 22