VASP 2703 2006 3
VASP 100 PC 3,4 VASP VASP VASP FFT. (LAPACK,BLAS,FFT), CPU VASP. 1 C LAPACK,BLAS VASP VASP VASP VASP bench.hg VASP CPU CPU CPU northwood LAPACK lmkl lapack64, BLAS lmkl p4 LA- PACK liblapack, BLAS libblas 51% CPU prescott LAPACK lapack double, BLAS libgoto LAPACK lapack double, BLAS lmkl em64t 40% -O0 -O1 60% 2 VASP CPU northwood LAPACK lmkl lapack64, BLAS lmkl CPU2 36% CPU3 95% CPU4 249%
1 2 2 3 2.1 CPU.............................. 3 3 BLAS,LAPACK 4 3.1............................... 4 3.1.1 Northwood........................... 4 3.1.2 Prescott............................. 6 3.2.................................. 8 3.3.................................. 8 4 CPU VASP 9 4.1.................... 9 4.1.1 Northwood........................... 9 4.1.2 Prescott............................. 10 4.2......................... 10 4.2.1 Northwood........................... 11 4.2.2 Prescott............................. 12 5 VASP 13 5.1 VASP.................. 14 6 17 A 18 B 20 C VASP 22 D MPICH 27 1
1 VASP 100 PC 3,4 VASP VASP VASP FFT. (LAPACK,BLAS,FFT), CPU VASP. 2
2 1. 1 C LAPACK,BLAS 2. VASP VASP VASP VASP VASP bench.hg 3. CPU 2.1 CPU 1. CPU: Intel Pentium4 Northwood (3.2GHz FSB=800MHz) OS: SuSE Linux9.3 : ASUS P4C800 : 2 :512KB : 1GB (Trancend PC3200 512MB ECC DIMM 2) : Seagate ST380817AS(SerialATA 80GB) 1 2. CPU: Intel Pentium4 Prescott 650(3.4GHz FSB=800MHz) OS: SuSE Linux9.3 : SuperMicro PDSGE : 2 :2MB : 2GB (PC2-4200 ECC 1GB DIMM 2) : Seagate ST3800817AS(SerialATA 80GB) 1 3
3 BLAS,LAPACK 3.1 1 C (A ) LAPACK,BLAS. LAPACK(Linear Algebra PACKage) netlib FORTRAN 77 CLAPACK C LAPACK CPU BLAS(Basic Linear Algebraic Subprograms) BLAS LAPACK BLAS CPU BLAS( ) C BLAS BLAS LAPACK LAPACK,BLAS LAPACK [1] n n LAPACK 1 0.67 N 3 3.1.1 Northwood CPU Intel Pentium4 northwood 3.2GHz 1 C LAPACK,BLAS. 3.1 LAPACK liblapack,blas libblas 1000 1000 0.82 817Mflops Mflops 1 100 ( ) flops Floating point number Operations Per Second 1 1 M( ) 100 (10 6 ). 2000 2000 8.24 325Mflops 2 4
3.1: N N. LAPACK BLAS N =1000 N =2000 time Mflops time Mflops libblas 0.82 817 8.24 325 liblapack lmkl p4 0.23 2913 1.53 1752 libgoto 0.42 1595 3.72 720 libblas 0.22 3045 1.33 2015 lmkl-lapack64 lmkl p4 0.20 3350 1.32 2030 libgoto 0.20 3350 1.31 2045 2 VASP LAPACK liblapack,blas libgoto 1000 1000 0.42 1596Mflops 2000 2000 3.72 720Mflops LAPACK libgoto libblas liblapack LAPACK liblapack,blas lmkl p4 1000 1000 0.23 2913Mflops 2000 2000 1.53 1752Mflops LAPACK lmkl p4 libgoto libblas liblapack. LAPACK lmkl lapack64,blas libbas, lmkl p4, libgoto 1000 1000 0.22, 0.2, 0.2 3045, 3350, 3350Mflops 2000 2000 1.33, 1.32, 1.31 2015, 2030, 2045Mflops LAPACK lmkl lapack64 liblapack lmkl lapack64 BLAS 3.1 LAPACK,BLAS liblapack SuSE Linux LAPACK lmkl lapack64 lmkl lib Math Kernal LIbrary 5
lapack64 LAPACK libblas SuSE Linux BLAS lmkl p4 Intel Math Kernal Library p4 Pentium4 (BLAS,FFT) libgoto Intel Pentium4 northwood BLAS http://www.tacc.utexas.edu/resources/software/ 3.1.2 Prescott CPU Intel Pentium4 prescott 3.4GHz 1 C LAPACK,BLAS. 3.2: N N. LAPACK BLAS N =1000 N =2000 time Mflops time Mflops liblapack libblas 0.86 779 6.60 406 lmkl lapack64 lmkl em64t 0.45 1489 4.27 628 lmkl lapack64 libgoto 0.16 4188 1.07 2505 3.2 LAPACK liblapack,blas libblas 1000 1000 0.86 779Mflops 2000 2000 6.60 406Mflops VASP 6
LAPACK lmkl lapack64,blas lmkl em64t 1000 1000 0.45 1489Mflops 2000 2000 4.27 628Mflops LAPACK lmkl lapack64, BLAS libgoto 1000 1000 0.16 4188Mflops 2000 2000 1.07 2505Mflops CPU Prescott Intel Math Kernal Library BLAS libgoto Intel Math Kernal Library LAPACK 3.2 LAPACK,BLAS liblapack CLAPACK LAPACK CLAPACK Fortran LAPACK C lmkl lapack64 libblas Intel Math Kernal Library EM64T LAPACK CLAPACK BLAS lmkl em64t libgoto Intel Math Kernal Library EM64T libgoto Intel Pentium4 prescott BLAS 7
3.2 CPU Intel Pentium4 northwood 3.2GHz C (B ) LAPACK,BLAS. LAPACK,BLAS LAPACK [1] n n LAPACK 1.33 N 3 3.3: N N. LAP ACK BLAS N=1000 N=2000 time Mflops time Mflops liblapack libblas 10.41 128 79.39 67 liblapack lmkl p4 9.90 134 74.87 71 lmkl lapack64 lmkl p4 9.62 138 72.47 73 3.3 LAPACK liblapack, BLAS libblas 1000 1000 10.41 128Mflops 2000 2000 79.39 67Mflops BLAS lmkl p4 1000 1000 9.9 134Mflops 2000 2000 74.87 71Mflops LAPACK lmkl lapack64, BLAS lmkl p4 1000 1000 9.62 138Mflops 2000 2000 72.47 73Mflops 3.3 CPU libgoto BLAS Northwood Intel Math Kernal Library BLAS Prescott libgoto CPU 8
4 CPU VASP 4.1 4.1.1 Northwood CPU Intel Pentium4 northwood 3.2GHz VASP (C ) VASP VASP bench.hg Intel 4.1 VASP 4.1: VASP. BLAS LAP ACK time lmkl p4 lapack double 203.5 lmkl lapack64 201.2 liblapack 202.9 libgoto lapack double 294.4 libblas liblapack 306.5 9
4.1.2 Prescott CPU Intel Pentium4 prescott 3.4GHz VASP VASP VASP bench.hg Intel 4.2 VASP 4.2: VASP. BLAS LAP ACK time lmkl em64t lapack double 192.7 lmkl lapack64 192.1 libgoto lapack double 137.4 (, ) VASP 4.2 Intel O0,O1,O2,O3. O0 O1 / O2 IA-32 Linux O1 O2 O3 O1 Pentium4 IA-32 O3. x{k W N B P} x Intel 1 10
K Pentium III Katmai, W Pentium 4 Willamete, N Northwood, B Pentium M Banias, P Prescott -xp Intel SSE3 SSE2 -xn -xw northwood SSE3 -xp This program was not built to run on the processor in your system. ax{k W N B P} 4.2.1 Northwood CPU Intel Pentium4 northwood 3.2GHz VASP VASP VASP bench.hg 4.3: VASP. OPTION BLAS LAP ACK time -O0 lmkl p4 lapack double 323.8 -O1 204.6 -O3 -xw -tpp7 203.5 -O3 -axn -xn-tpp7 -ip -mp1 203.0 -O0 lmkl p4 lmkl lapack64 323.9 -O1 206.8 -O3 -xw -tpp7 201.2 -O3 -axn -xn-tpp7 -ip -mp1 200.2 -O0 libgoto lapack double 435.5 -O1 316.7 -O3 -xw -tpp7 309.0 -O3 -axn -xn-tpp7 -ip -mp1 308.7 11
4.2.2 Prescott CPU Intel Pentium4 prescott 3.4GHz VASP VASP VASP bench.hg 4.4: VASP. OPTION BLAS LAP ACK time -O0 libgoto lapack double 254.0 -O1 140.8 -O3 -xw -tpp7 136.4 -O3 -axp -xp-tpp7 -ip -mp1 133.8 -O0 lmkl em64t llapack double 309.0 -O1 195.3 -O3 -xw -tpp7 191.0 -O3 -axp -xp-tpp7 -ip -mp1 188.7 -O0 lmkl em64t lmkl lapack64 308.1 -O1 194.7 -O3 -xw -tpp7 190.2 -O3 -axp -xp-tpp7 -ip -mp1 187.9 4.3, 4.4 -O0 -O1-03 - O0,-O1 VASP 12
5 VASP CPU Intel Pentium4 prescott 3.4GHz 2 VASP CPU CPU CPU MPI(Message Passing Interface) MPI C Fortran MPI MPICH MPICH(C ) VASP MPI MPICH mpirun 5.2 CPU 1 BLAS lmkl em64t,lapack lmkl lapack64 VASP 192.7 CPU 2 121.9 CPU 2 VASP 58% CPU 1 BLAS lmkl em64t,lapack lapack double VASP 195.3 CPU 2 123.3 CPU 2 VASP 58% CPU 1 BLAS libgoto,lapack lapack double VASP 137.4 CPU 2 93.5 CPU 2 VASP 47% CPU 2 VASP 13
5.1: CPU VASP. NODE BLAS LAP ACK time 1 lmkl em64t lmkl lapack64 192.7 2 lmkl em64t lmkl lapack64 121.9 1 lmkl em64t lapack double 195.3 2 lmkl em64t lapack double 123.3 1 libgoto lapack double 137.4 2 libgoto lapack double 93.5 5.1 VASP CPU Intel Pentium4 northwood 3.2GHz VASP 1 CPU VASP 2 VASP CPU 1/CPU VASP MPICH 5.1 CPU 2 CPU BLAS lmkl libgoto LAPACK -lmkl lapack64 lapack double -O0 -O3 -mp1 -tpp7 5.1 CPU 2 5.1 CPU 2 4 1/CPU CPU 8 CPU 4 14
5.2: CPU VASP. BLAS LAP ACK OPTION NODE time lmkl lmkl lapack64 -O0 -mp1 1 341.7 2 220.9 3 155.3 4 116.6 8 119.2 lmkl lmkl lapack64 -O1 -mp1 1 213.0 2 158.2 3 110.1 4 85.8 8 102.8 lmkl lmkl lapack64 -O3 -mp1 -tpp7 1 210.3 2 154.7 3 107.6 4 84.5 8 102.1 libgoto lapack double -O3 -mp1 -tpp7 1 244.0 2 175.9 3 123.0 4 95.6 8 108.3 15
5.1: VASP CPU 16
6 1. BLAS,LAPACK CPU 2. CPU VASP CPU northwood LAPACK lmkl lapack64, BLAS lmkl p4 LAPACK liblapack, BLAS libblas 51% CPU prescott LAPACK lapack double, BLAS libgoto LAPACK lapack double, BLAS lmkl em64t 40% -O0 -O1 60% 2 3. VASP CPU northwood LAPACK lmkl lapack64, BLAS lmkl CPU2 36% CPU3 95% CPU4 249% 17
A #include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> //#include <veclib/veclib.h> #include "/usr/local/include/f2c.h" #include "/usr/local/include/clapack.h" void printmatrix(double *a, double *b, int n); int main(void){ long n, nrhs=1, lda, ldb, info; // double A[LDA*LDA], B[LDA*NRHS]; clock_t start, end; int i,j; double *a, *b; long *ipiv; scanf("%ld",&n); printf("%dn",n); lda=ldb=n; a=(double *)malloc(n*n*sizeof(double)); b=(double *)malloc(n*sizeof(double)); ipiv=(long *)malloc(n*sizeof(long)); for(i=0;i<n;i++){ for(j=0;j<n;j++){ a[i*n+j]=2*(double)random() / RAND_MAX -1.0; } } 18
for(i=0;i<n;i++){ b[i]=2*(double)random() / RAND_MAX -1.0; } // printmatrix(a,b,n); start=clock(); dgesv_(&n,&nrhs, a, &lda, ipiv, b, &ldb, &info); // MatrixInverse(a,b,n); // printmatrix(a,b,n); end=clock(); printf("%10.4fn",(double)(end-start)/clocks_per_sec); free(a); free(b); } return 0; void printmatrix(double *a, double *b, int n){ int i,j; for(i=0;i<n;i++){ for(j=0;j<n;j++){ printf("%10.5f",a[i*n+j]); } printf("%10.5f",b[i]); printf("n"); } printf("n"); return; } 19
B #include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> //#include <veclib/veclib.h> #include "/usr/local/include/f2c.h" #include "/usr/local/include/clapack.h" void printmatrix(double *a, double *b, int n); int main(void){ long n, lda, lwork, info; // double A[LDA*LDA], B[LDA*NRHS]; char jobs= V, uplo= U ; clock_t start, end; int i,j; double *a, *w, *work; long *ipiv; scanf("%ld",&n); printf("%dn",n); lda=n; lwork=n*3; a=(double *)malloc(n*n*sizeof(double)); w=(double *)malloc(n*sizeof(double)); work=(double *)malloc(n*sizeof(double)); for(i=0;i<n;i++){ for(j=0;j<n;j++){ a[i*n+j]=2*(double)random() / RAND_MAX -1.0; } } 20
// printmatrix(a,w,n); start=clock(); //dgesv_(&n,&nrhs, a, &lda, ipiv, b, &ldb, &info); dsyev_( &jobs, &uplo, &n, a, &lda, w, work, &lwork, &info); // MatrixInverse(a,b,n); // printmatrix(a,b,n); end=clock(); printf("%10.4fn",(double)(end-start)/clocks_per_sec); free(a); free(w); return 0; } void printmatrix(double *a, double *b, int n){ int i,j; for(i=0;i<n;i++){ for(j=0;j<n;j++){ printf("%10.5f",a[i*n+j]); } printf("%10.5f",b[i]); printf("n"); } printf("n"); return; } 21
C VASP 1. Intel Fortran Compiler Version 9 (*.lic) /opt/intel/liceses/ mv /Desktop/commercial for l *.lic /opt/intel/licenses/ $ cd /media/l fc p 9 0 Intel Fortran Compiler Version 9 $./install.sh Please type a selection: 1 Please type a selection: 2 /opt/intel/liceses/commercial for l *.lic accept x.exit 2. Intel C++ Compiler Version 9 Fortran /opt/intel/liceses/ $ cd /media/l cc p 9 0 $./install.sh 22
Fortarn Linux Application Debugger Fortran. 3. Fortran, C++.cshrc $ emacs./cshrc set path= /opt/intel/fc/9.0/bin /opt/intel/cc/9.0/bin setenv LD LIBRARY PATH /opt/intel/mkl72/lib32: /opt/intel/fc/9.0/lib 4. YaST2 gcc glibc fftw3 (fftw fftw3 fftw3-debuginfo fftw3-devel fftw3-threads) lapack,blas 5. vasp.4.6.tar vasp.4.lib.tar $ tar -xvf vasp.4.6.tar $ tar -xvf vasp.4.lib.tar vasp.4.6/ vasp.4.lib/. 6. vasp.lib $ cd vasp.lib/ $ cp makefile.linux ifc P4 makefile vasp.lib/ Linux Intel fortran compiler(ifc),p4 makefile makefile 23
$ emacs makefile FC=ifc FC=ifort Intel fortran compiler ifc ifort $ make 7. vasp.4.6 $ cd vasp.4.6 $ cp makefile.linux ifc P4 makefile $ emacs makefile 7.1. FC=ifc FC=ifort 7.2. BLAS makefile BLAS /opt/libs/lbgoto p4 512-r0.6.so northwood BLAS=-L/opt/intel/mkl72/lib/32 -lmkl p4 -lsvml prescott BLAS=-L/opt/intel/mkl72/lib/em64t -lmkl em64t -lpthread -lsvml libgoto BLAS $ cd /opt/libs/ $ mkdir libgoto libgoto libgoto BLAS makefile CPU Intel northwood(presccot) BLAS=-L/opt/libs/libgoto/libgoto northwood(prescott)32p-r1.00.so -lpthread -lsvml 24
LINK = -lirc -lguide -lsvml -lcprts -lunwind -lcxa -lifport Wl,-rpath=/opt/libs/libgoto 7.3. LAPACK LAPACK=../vasp4.lib/lapack double.o Intel Math Kernel Library LAPACK northwood LAPACK=-L/opt/intel/mkl72/lib/32 -Imkl lapack64 presccot LAPACK= -L/opt/intel/mkl72/lib/em64t -lmkl lapack64 -lguide 7.4. FFT3D northwood FFT3D = fftw3d.o fft3dlib.o /usr/lib/libfftw3.a presccot FFT3D= fft3dfurth.o fft3dlib.o /usr/lib64/libffw3.a 7.5. MPI D MPICH FC=ifort -I/usr/lib/mpich-1.2.5.2/ FCL=/usr/lib/mpich-1.2.5.2/bin/mpif90 7.6. $ make cannot open shared object file PATH 25
vasp.4.6./vasp 8. VASP Hg.tar $ tar -xvf Hg.tar $ cd Hg $ directory where VASP resides/vasp 26
D MPICH 1. /etc/hosts IP 192.168.3.4 bob1 192.168.3.5 bob2 /etc/hosts.equiv./hosts bob1 bob2 1. MPICH http://www-unix.mcs.anl.gov/mpi/mpich/ mpich-1.2.5.2.tar.gz $ tar -xvf mpich-1.2.5.2.tar.gz $ cd mpich-1.2.5.2 $./configure prefix=/usr/lib/mpich-1.2.5.2 prefix= $./configure with-arch=linux with-device=ch p4 -fc=ifort -f90=ifort -prefix=/usr/local/bin mpich-1.2.5.2 27
$ make $ make install 2. $ cd /usr/lib/mpich-1.2.5.2/examples $ make cpi $./mpirun -np 1 cpi Process 0 on takeda1 pi is approximately 3.141600989231254, Error is 0.000000833333333323 wall clock time =0.000000 3. 3.1. $ cd /usr/local/mpich-1.2.5.2/share/ machines.linux takeda1 takeda2 # takeda1 takeda2 3.2. PATH /usr/local/mpich-1.2.5.2/bin bsh $ export PATH=$PATH:/usr/local/mpich-1.2.5.2/bin 28
csh.cshrc set path=/usr/local/mpich-1.2.5.2/bin 3.3. $./mpirun -np 2 cpi Process 0 on takeda1 Process 1 on takeda2 pi is approximately 3.141600989231254, Error is 0.000000833333333323 wall clock time =0.000000 3.4. VASP $ directory where VASP resides/vasp $./mpirun -np 2 $./mpirun -np 2 directory where VASP resides/vasp 29
30
[1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorense, LAPACK ( 1995) [2] P. MPI ( 2001) [3] VASP http://cms.mpi.univie.ac.at/vasp/ 31