2002 advanced seminar Intelligent Systems Design Lab. 1 PBS 2002 11 15 : PC PC PBS 1 Fig. 1 (Job Management System : JMS) Fig. 1 2 JMS JMS 20 1. DQS Distributed Queuing System Florida State University JMS Condine GENIAS GmbH Debian 2. LSF:Load Sharing Facility Platform Computing Corporation 20000 3. Condor University of Wisconsin I/O 1
4. NQS / Generic NQS / The Connect:Queue NQS JMS JMS NQS (NQS ) Sterling Software 1980 The Connect:Queue Generic NQS Generic NQS University of Sheffield 3 PBS PBS(Portable Batch System) Veridian Systems NASA JMS UNIX Fig. 2 PBS Fig. 2 PBS 4 PBS PC HIT HP PBS PBS ( ) 10 PBS PBS LSF JMS CPU (CPU memory ) PBS HIT HP CPU PBS JMS LSF 2
5 PBS PBS 4 Job Server Job Executor Job Scheduler : PBS GUI 3 1. : qsub qstat qdel qreturn 2. : qenable qdisable qrun qstart qstop qterm 3. : qmgr pbsnodes Job Server Job Server PBS pbs server IP Job Executor queue FIFO First In First Out 2 1. PBS HOME/server priv 2. Job Executor Job Executor (pbs mom) Mom Mom Mom Mom Job Scheduler Job Scheduler PBS FIFO FIFO fifo PBS HOME/sched priv/sched config pbs sched Mom API 3
Fig. 3 PBS 1. 2. 3. Mom 4. Mom 5. 6. 7. 8. Mom Mom Host A 8 4 Mom Host B Policy Scheduler 3 Kernel Running Jobs Server 6 2 5 7 3 4 3 Mom 4 1 Event Queues Kernel Running Jobs Kernel Mom Running Jobs Host C Client Fig. 3 PBS 6 6.1 PBS http://pbs.mrj.com/download.html 2 PBS 6.2 PBS tar xvfz OpenPBS X X XX.tar.gz PBS cd OpenPBS X X XX./configre [option] make make install./configre PBS 4
--prefix= PBS (Default:/usr/local) --enable-docs PBS (Default:disable) --enable-server PBS (Default:enabled) 1 6.3 6.3.1 PBS {PREFIX}/server_priv/nodes Queen Slave :ts Queen:ts Slave:ts pbs server pbs sched pbs mom pbs server -t 2 2 # pbs_server -t create # pbs_sched # pbs_mom {PREFIX}/mom priv/config $clienthost Queen $clienthost Slave 6.3.2 pbs mom # pbs_mom {PREFIX}/mom priv/config 3 $clienthost Queen $clienthost Slave 6.4 qmgr qmgr command server [names] [attr OP value[,attr OP value,...]] command queue [names] [attr OP value[,attr OP value,...]] command node [names] [attr OP value[,attr OP value,...]] command active 1 server disable 2 bin PATH 3 5
create delete set unset list print OP command queue type = Execution Routing enabled = True False started = True route destination(routing ) ( Qmgr:set queue route destinations=para,overthere@anoter.com) resources max ( Qmgr:set queue resource max.cput=2:00:00 2 CPU ) cput: CPU pcput: CPU mem: pmem: vmem: pvmem: nodect: walltime: resources min resources default 4 Queen:/root# qmgr Qmgr:create queue para Qmgr:set queue para queue type = Execution Qmgr:set queue para enabled = True Qmgr:set queue started = True Qmgr:set server scheduling = True Qmgr:set server default queue = para Qmgr:set server acl hosts = *.isl.doshisha.ac.jp Qmgr:set server acl host enable = True Qmgr:set queue para resourves min.nodect=1 Qmgr:set queue para resourves max.nodect=2 Qmgr:quit 4 Qmgr: 6
7 PBS PBS. qsub : qstat : qdel : 7.1 qsub 7.1.1 qsub -l Fig. 4 kim@queen:~/pbs$ qsub -l ncpus=1,nodes=1 -q para -N test[enter] cd $PBS\_O\_WORKDIR [Enter] 2./a.out [Enter] 3 [Ctrl]+[d] 4 3.Queen.work.isl.doshisha.ac.jp 5 kim@queen:~/pbs$ 1 Fig. 4 qsub 1. CPU,. 2. cd 3.. 4. [Ctrl]+[d]. 5. JobID. 7.1.2 Fig. 6 kim@queen:~$ less pai.sh #!/bin/sh #PBS -l ncpus=1 #PBS -l nodes=1 #PBS -q para #PBS -N test cd $PBS_O_WORKDIR./a.out kim@queen:~$ qsub pai.sh 3.Queen.work.isl.doshisha.ac.jp kim@queen:~$ Fig. 5 7
7.1.3 qsub qsub qsub -N jobname -q queuename -o outfile -e errorfile -j oe -l ncpus 1 CPU mem 1 walltime 1 nodes 1 node -m a b e abe 3 kim@queen:~$ ls OpenPBS_2_3_16.tar.gz a.out ccc.sh.e5 cpi.c pai.sh a ccc.sh ccc.sh.o5 cpi.o upt.sh kim@queen:~$ less pai.sh #!/bin/sh #PBS -A nakao #PBS -N test #PBS -j oe #PBS -l ncpus=2 mpirun -np 2 a.out exit 0 kim@queen:~$ qsub pai.sh 17.Queen.work.isl.doshisha.ac.jp kim@queen:~$ ls OpenPBS_2_3_16.tar.gz a.out ccc.sh.e5 cpi.c pai.sh upt.sh a ccc.sh ccc.sh.o5 cpi.o test.o17 kim@queen:~$ less test.o17 Process 0 on Queen.work.isl.doshisha.ac.jp Process 1 on Slave.work.isl.doshisha.ac.jp pi is approximately 3.1416009869231241, Error is 0.0000083333333309 wall clock time = 0.000915 Fig. 6 MPI pbs 8
7.2 qstat qstat qstat qstat -Q qstat -q qstat -a qstat -s qstat -r qstat 7.2.1 [qstat -Q] kim@queen:~$ qstat -Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type ---------------- --- --- --- --- --- --- --- --- --- --- ---------- para 0 0 yes yes 0 0 0 0 0 0 Execution Fig. 7 qstat -Q Max Hld Tot Wat Ena (Enable yes or no) Trn Str (Started yes or no) Ext Que Type (Execution or Route) Run 9
7.2.2 [qstat -q] kim@queen:~$ qstat -q server: Queen Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- para -- -- -- -- 0 0 -- E R Fig. 8 qstat -q Memory CPU Time 1 Walltime 1 Node 1 node Run Que Lm Srate E (Enable or Disable) Srate R (Running or Stopped( )) 7.2.3 [qstat -a] [sgiadm@bshead PBS]$ qstat -a work.isl.doshisha.ac.jp: Req d Req d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 3.Queen.work.is kim para pbstest005 9917 5 5 20mb 744:0 R -- 4.Queen.work.is kim para pbstest005 9832 5 5 20mb 744:0 R -- Fig. 9 qstat -a 7.3 Job ID 1. qstat Job ID 2. Job ID qdel 3. qstat 10
Job ID ID Username user Queue Jobname User SessID ID NDS node TSK CPU or Req d Memory memory Req d Time S (Running or Queueing) Elap Time CPU or kim@queen:~/pbs$ kim@queen:~/pbs$ qsub test.sh 1 3.Queen.work.isl.doshisha.ac.jp kim@queen:~/pbs$ qstat -a work.isl.doshisha.ac.jp: Req d Req d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 3.Queen.work.is kim para pbstest005 9917 5 5 20mb 744:0 R -- kim@queen:~/pbs$ qdel 3 2 kim@queen:~/pbs$ qstat -a 3 kim@queen:~/pbs$ Fig. 10 1) PC 2) Condor 3) LSF+SCC 4) SGI PBS 11