大規模共有メモリーシステムでのGAMESSの利点

Similar documents
HP High Performance Computing(HPC)

HPE Moonshot System ~ビッグデータ分析&モバイルワークプレイスを新たなステージへ~

HP High Performance Computing(HPC)

HP ProLiant Gen8とRed Hatで始めるHadoop™ ~Hadoop™スタートアップ支援サービス~

HP Workstation 総合カタログ

VNSTProductDes3.0-1_jp.pdf

HP StoreVirtual(LeftHand)

HP Workstation Xeon 5600

SharePoint 2003 Performance White Paper

HP Compaq Business Desktop dc7700シリーズ

HP xw9400 Workstation

... 3 Oracle on Linux I/O I/O... 5 I/O io_request_lock... 6 I/O GB RAM SGA GB RAM Very Large M

インテル® MPI ライブラリー・ランタイム環境入門ガイド


IoTを加速するエッジコンピューティング HPE Edgeline Converged IoT Systems

HP Blade Workstation HP RCS Remote Client Solution HP Blade Workstation CO2 2

和佐田 裕昭P indd

HPE StoreEasy 1000/3000 Storage 第5世代

untitled

ProLiant DL180 システム構成図

HP ProLiant サーバー Generation 8 ~AMD Opteron™ 6300シリーズプロセッサー搭載製品カタログ~

ProLiant BL460c システム構成図

GPGPU

HP Compaq Business Desktop dx7300シリーズ

ProLiant DL140 システム構成図

ProLiant ML110 システム構成図

ProLiant DL380 Generation 4 システム構成図


― ANSYS Mechanical ―Distributed ANSYS(領域分割法)ベンチマーク測定結果要約

untitled

インテル® MPI ライブラリー Windows* 版

untitled

PRIMERGY 性能情報 SPECint2006 / SPECfp2006 測定結果一覧

i Ceph

VNXe3100 ハードウェア情報ガイド

HP Workstation 総合カタログ

ProLiant DL380 SAN Storageモデル システム構成図

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

nakayama15icm01_l7filter.pptx

HPE Moonshot System HDI ソリューション

システムユニット構成ツリーの見方

PRIMERGY 性能情報 SPECint2006 / SPECfp2006 測定結果一覧

ProLiant BL20p Generation 4 システム構成図

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

名称未設定

VNXe3300 ハードウェア情報ガイド

ネットワークビデオレコーダー VK-64/VK-16/VK-Lite v2.2 セットアップガイド

Express5800/140Ma

ProLiant BL35p システム構成図

Express5800/320Fc-MR

はじめに

HP COMPAQ BUSINESS DESKTOP DC7800シリーズ

01_OpenMP_osx.indd

Express5800/140Ma

ProLiant ML110 Generation 4 システム構成図

L422277A_Xserve_Guide_01

OVERVIEW hp StorageWorks NAS 2000s hp StorageWorks NAS 2000s A 3.5 B 3.5 IDE DVD-ROM C LED LED Ultra320 SCSI ( ) NAS 2000s NAS 2000s NAS

インテル(R) Visual Fortran Composer XE

ProLiant ML110 Generation 4 システム構成図

RDMAプロトコル: ネットワークパフォーマンスの向上

ProLiant SL6000 Sclable System システム構成図

インテル(R) C++ Composer XE 2011 Windows版 入門ガイド

Z8 G4 WorkstationでのANSYS19.1 Mechanical ベンチマーク結果紹介資料(フル版)

09中西

Express5800/R320a-E4/Express5800/R320b-M4ユーザーズガイド

和佐田P indd

N Express5800/R320a-E4 N Express5800/R320a-M4 ユーザーズガイド

PRIMERGY RX300 S2 システム構成図

TD-C56D.indd

Express5800/R320a-E4, Express5800/R320b-M4ユーザーズガイド

Microsoft Word - nvsi_050110jp_netvault_vtl_on_dothill_sannetII.doc

main.dvi

NEC Storage series NAS Device

Transcription:

Technical white paper GAMESS GAMESS Gordon Group *1 Gaussian Gaussian1 Xeon E7 8 80 2013 4 GAMESS 1 RHF ROHF UHF GVB MCSCF SCF Energy CDFpEP CDFpEP CDFpEP CD-pEP CDFpEP SCF Gradient CDFpEP CDFpEP CDFpEP CD-pEP CDFpEP SCF Hessian CD-p-- CD-p-- ----- CD-p-- -D-p-- MP2 energy CDFpEP CDFpEP CD-pEP ------- CD-pEP MP2 gradient CDFpEP -D-pEP CD-pEP ------- ------ CI energy CDFp-- CD-p-- ------ CD-p-- CD-p-- CI gradient CD----- ------- ------- ------- ------ CC energy CDFpE- CDF-E- ------- ------- ------ EOMCC excitations CD E- CD E- ------- ------- ------ Semi-empirical: DFT energy CDFpEP CD-pEP CD-pEP DFT gradient CDFpEP CD-pEP CD-pEP TD-DFT energy CDFpEP ------ CD-p-- TD-DFT gradient CDFpEP ------ ------ Mopac energy y Y y y Mopac gradient y Y y n 1. p parallel Xeon E7

GAMESS GAMESS 1 4 Replicated data Distributed data 2 $SYSTEM MWORD, MEMDDI Node#0 p=0 Node#1 p=2 X X Replicated data Replicated data Node#0 p=0 Replicated data Node#1 p=1 Replicated data Node#0 p=1 X Node#1 p=3 X Distributed data Distributed data Distributed data Distributed data 1. GAMESS ( 1 core Memory = MWORD + MEMDDI / ( ) MWORD, MEMDDI word 8x10^6 1 2 compute process CP data server ( DS) 16 8 p=2 p=0 p=1 p=2 p=0 DS MB GAMESS CP DS DS 3 1 (runtype=optimize) (runtype=hessian) FCM $VEC, $HES (runtype=vscf) cc-pvdz MP2/CCSD/ CCSD(T) 3 $CONTRL SCFTYP=RHF RUNTYP=VSCF CCTYP=CCSD(T) ISPHER=1 $END $SYSTEM TIMLIM=100000 MWORDS=100 $END $BASIS GBASIS=CCD $END $VSCF NGRID=16 PETYP=DIRECT $END $GUESS GUESS=HUCKEL $END $DATA H2O CCSD(T)/cc-pVDZ Anharmonic Frequency CNV 2 O 8.0 0.0000000000 0.0000000000 0.1403674299 H 1.0-0.7493341373 0.0000000000-0.4675002149 $END Exp. MP2 CCSD CCSD(T) 1 ( ) 1595 1610 1629 1622 2 ( ) 3657 3670 3658 3628 3 ( ) 3756 3757 3728 3699 (SATA DISK ) - 143 163 131 (RAM DISK: /dev/shm - 77 83 66 2.

MP2 IO HDD RAM-DISK RAM-DISK 2 IO 2 C540 (fullerene) C540 PC Gamess test2 *2 MOPAC PM3 SCFTYPE RHF PM3 1 PM3 GAMESS http://spec.org/cpu2000 *3 3 -xhost CPU SSE AVX -ipo Nehelam TURBO $CONTRL SCFTYP=RHF RUNTYP=gradient nprint=-5 ISPHER=1 $END $SYSTEM TIMLIM=3000 MWORDS=100 $END $BASIS GBASIS=PM3 $END $GUESS GUESS=HUCKEL $END $scf soscf=.t. $end $DATA Case Elapsed Time(sec) R-PM3 ENERGY SCF iter. 1 (default) 175 2349.3718701794 15 2 (optimize) 178 2349.3718701796 16 3 (optimize + turbo) 141 2349.3718701796 16 3. PM3 Default: -i8 -O2 Optimize: -i8 -xhost -ipo -O3 -no-prec-div -unroll2 -static -scalar-rep- 3 C20H16N4 (porphyrin) 4 $CONTRL RUNTYP=GRADIENT INTTYP=HONDO ICUT=10 CITYP=CIS NPRINT=-5 $END $SYSTEM TIMLIM=6000 MWORDS=100 $END $GUESS GUESS=HUCKEL $END $SCF DIRSCF=.T. $END $CIS NSTATE=1 CHFSLV=DIIS $END $DATA GAMESS MPI DDI DDI GAMESS VERSION = 1 MAY 2012 (R2) Intel-MPI 4.1.0 Mathlib Intel MKL 13.0 Fortran Intel fortran 13.0 [composer_xe_2013.2.146]

PC GAMESS test6 7 DDI 1 16 CP DS 8 4 2 1024 256 Cores Elapsed Time(sec) FINAL RHF ENERGY SCF iter 1 85381-983.5780915418 24 16 10730-983.5780915569 16 32 5618-983.5780915613 23 64 2927-983.5780915656 25 128 1555-983.5780915563 21 256 919-983.5780915667 28 512 662-983.5780915603 21 1024 444-983.5780915592 20 4. (1) RHF SCF DENSITY MATRIX < 2.00E-05 100000 250 10000 200 ELapsed time[s] 1000 100 150 100 Elapsed time scale 10 50 1 0 200 400 600 800 1000 0 cores 2. (2) MPI GAMESS SCF MAXIT=30 30 UNCONVERGED MPI ERROR Infinibad-FDR Gigabit mpiexec Intel-MPI I_MPI_DEVICE= shm or rdssm or ssm Infiniband rdssm Gagabit ssm 5 SCF 256 512 Cores SCF iterations infiniband SCF iterations Gigabit 64 25 21 0.99 128 21 25 0.95 256 28 16 0.97 512 21 19 0.88 1024 20 23 0.67 5. Elapsed Time Infiniband(rdssm)/Gigabit(ssm)

GAMESS GAMESS SL230Gen8 DL980G7 Processor Intel E5-2670 2.7Ghz Intel E7-4870 2.4Ghz Sockets, Core 2socket, 16core 8socket, 80core Memory 8GB DIMM x16 32GB DIMM x128 OS RedHat 6.3 RedHat 6.4 Interconnect InfiniBand FDR - 6. SMP CP MEMDDI MEMDDI distributed data MEMDDI DS CPU CP 2 CPU Intel-MPI I_MPI_WAIT_MODE (csh ) setenv I_MPI_WAIT_MODE enable SystemV ipcs -l sysctl -w kernel.shmmax= xxx kernel.shmmax = 68,719,476,736 (64GB) kernel.shmall = 4,294,967,296 (4GB) kernel.shmmni = 4,096 (4GB) kernel.shm_rmid_forced = 0 vm.hugetlb_shm_group = 0 DS 7-8 6 DL980G7 Cores CP Elapsed Time RHF ENERGY ITERATIONS 8 4 23373-983.5780915653 15 16 8 11792-983.5780915632 15 32 16 6202-983.5780915587 22 64 32 3672-983.5780915634 15 80 40 3032-983.5780915579 18 7. DS Cores CP Elapsed Time RHF ENERGY ITERATIONS 8 8 12064-983.5780915633 15 16 16 6384-983.5780915652 15 32 32 3590-983.5780915622 15 64 64 2427-983.5780915633 15 80 80 2348-983.5780915652 15 8. DS DS 8 2 80 1.3 8 80 386 16% DS off 80 512 500%

40 35 jobs per day 30 25 20 15 10 5 0 0 20 40 60 80 cores 3. DL980 DS 2 GAMESS with DS on SL230 GAMESS with DS on DL980 GAMESS without DS on DL980 MP2 CCSD Hatree-Fock HF MP2 CCSD SCF 2 MP2 2 CCSD 5-6 *4 CCSD MP CCSD, CCSD(T) Gradient 4 CCSD, CCSD(T) GAMESS nodereplicated distributed storage shmmax node-replicated process replicated distributed storage(memddi) DDI TCP-IP 9 EXETYP=CHECK CCSD CC MAXCC=1 30 MAXCC=30 HF 2000 CCSD AO MO AO MO MO CPU CP-0 CP-1 CP-2 CP-79 MWORDS t-ai MWORDS t-ai MWORDS t-ai MWORDS t-ai Gamess process node-replicated t-ij,ab fully distributed storage of the [VV 00], [VV 00], [V0 V0], [V0 00],[00 00] integrals The area of this entire big box is MEMDDI SysV shared memory segment More SysV shared memory segment DS-80 DS-81 DS-82 DS-159 Gamess process 4. CCSD,CCSD(T)

Replicated memory(mb) Distributed memory(mb) HF 36 0.3 MP2 89 4171 CCSD 240 332,152 CCSD(T) 1.2 75,752 9. EXETYP=CHECK 1000 jobs per day 100 10 1 0.1 HF MP2 DFT (b3lyp) CCSD CCSD (T) 0.01 0 50 100 150 200 cores 5. CCSD MAXCC=30 1. Data Server 2. CPU Compute Process 80-128 3. MP2 2 CCSD 4 CCSD 4. CCSD GPGPU libcchem *1) http://www.msg.ameslab.gov/gamess *2) http://classic.chem.msu.su/gran/gamess/index.html *3) http://www.spec.org/cpu2000/ *4) http://www.chem.waseda.ac.jp/nakai/research_dc.html Xeon E7 HP ProLiant DL980 G7 Xeon E7-4870 2.4GHz 8 80 32GB DIMM 128 OS Red Hat 6.4

03-5749-8328 9:00 19:00 10:00 17:00 5/1 HP ProLiant http://www.hp.com/jp/proliant Intel Intel Intel Inside Intel Inside Xeon Xeon Inside / Intel Corporation 2013 3 Copyright 2013 Hewlett-Packard Development Company,L.P. 136-871 1 2-2-1