PC 2000 2 18 2 HPC
Agenda PC
Linux OS UNIX OS Linux Linux OS
HPC 1 1CPU CPU
Beowulf PC (PC) PC CPU(Pentium ) Beowulf: NASA Tomas Sterling Donald Becker 2 (PC ) Beowulf
PC!!
Linux Cluster (1) Level 1: OS Level 2:
Linux Cluster (2) Level 3: ( )
( ) SofTek PC Cluster 1350-324 324 (24 node) Spec CPU: Pentiumlll 500MHz * 24 RAM:12GB Network: Fast Ethernet(100BaseTX) Peak Performance: 13.6GFlops OS: LASER5 Linux6.0 (kernel 2.2.5) Compiler: PGI CDK Programming Model: C/C++, F77,f90,HPF Parallel Programming: MPI, PVM, HPF
Beowulf type 1. 2.
bottleneck Processor Memory Processor Onchip Cache 16K-32KB Cache 128K 4MB Memory
Pentium II 300MHz
Pentium II 300MHz
1CPU 1CPU/SMP Linpack LAPACK ScaLAPACK ATLAS ASCI-Red Opt. BLAS PBLAS Parallel BLAS BLAS BLAS Basic Linear Algebra Subprogram Basic Linear Algebra Communication Subprogram BLACS PVM/MPI..
BLAS Level Level 1 BLAS Vector-Vector Operations + S V V * V Level 2 BLAS Matrix-Vector Operations V M * V Level 3 BLAS Matrix-Matrix Operations + M M M * M
BLAS Level 1 BLAS y = y + s * x Operation Level 2 BLAS y = y + A* x Operation Level 3 BLAS C = C + A*B Operation
BLAS MFLOPS 250 200 150 100 50 Level 3 BLAS Level 2 BLAS Level 1 BLAS 100 200 300 400 500 Order of Vector/Matrix
LU Linpack &LAPACK) ATLAS BLAS LAPACK (BLAS 3) ASCI-Red BLAS Normal BLAS PGI compiler Linpack (BLAS1)
Linpack LAPACK : Level 1 BLAS : Coding Style : Cache : Level 3 BLAS : Block algorithm Cache BLAS Cache ATLAS (Automatically Tuned Linear Algebra Software) ASCI-Red BLAS
TCP/IP (1) Socket I/F TCP/IP window ack check sum CPU TCP/IP mbuf TCP CPU
TCP/IP (2) MTU(Ethernet:1500byte) large packet OS interrupt
IP USER Space Kernel Space NIC TCP/UDP TCP/IP
(NIC) API M-VIA (Linux VI Architecture GAMMA (Linux Active Message M-VIA, GAMMA MPI
M-VIA (A High Performance Modular VIA for Linux) [3] VI Architecture API NIC DEC Tulip (DC21*4*, 21143 ) chip, Intel i8255x (for x=7, 8 or 9) chip, Packet Engines GNIC-I, GNIC-II Gigabit Ethernet M-VIA
MVICH [4] VI Architecture MPICH 1.1.2 MPI (0.0.3 bsend, pack/unpack M-VIA
Pentium III 500MHz 2, Memory 384MB, Intel EtherExpress Pro/100 NIC, 100Base Switching Hub, Linux 2.2.13 128byte MPICH MVICH 1.9
MPICH socket(tcp) MVICH(M-VIA) M-VIA 4Kbyte MPICH MVICH 34% 139%(32byte)
GAMMA (Genoa Active Message Machine) [5] communication handlers Active Messages [7] API NIC DEC Tulip (DC21*4*, 21143 ) chipsets, Intel i8255x (for x=7, 8 or 9) chipsets
MPI/GAMMA [6] GAMMA MPI MPICH 1.1.2 Fast Ethernet MPI
Pentium III 500MHz 2, Memory 384MB, DEC DC21143 NIC, 100Base Switching Hub, Linux 2.2.13 128byte MPICH MPI/GAMMA 3.1
MPICH socket(tcp) MPI/GAMMA GAMMA 8Kbyte MPICH MPI/GAMMA 49% 404% (32byte)
IP
MPICH socket(tcp) MVICH(M-VIA) M-VIA MPI/GAMMA GAMMA IP
( ) Fast Ethernet API MPI GAMMA MPI/GAMMA Fast Ethernet Gigabit Fast Ethernet
ScaLAPACK ScaLAPACK(ScalLable Linear Algebra PACKage) LAPACK PGI CDK ScaLAPCK LU xdlutime [8] Pentium III ATLAS BLAS ASCI-Red BLAS ScaLAPACK MPI MPICH p4 MPI/GAMMA BLAS MPI
Pentium III 500MHz 4, Memory 256MB, DEC DC21143 NIC, 100Base Switching Hub, Linux 2.2.13 CPU 2 2 1 64 64 N N cpu1 cpu2 64 64 cpu3 cpu4
ScaLAPACK Matrix size 2000 ASCI-red, MPI/GAMMA
ScaLAPACK ASCI-MPICH ASCI-MPI/GAMMA 4% 79% (size 100)
1 VAMPIR VAMPIRtrace MPICH MPI/GAMMA
2 MPICH(p4)
3 MPI/GAMMA
4 850Mflop/s
[1] http://www.netlib.org/atlas/ [2] http://www.cs.utk.edu/~ghenry/distrib/archive.htm [3] http://www.nersc.gov/research/ftg/via/ [4] http://www.nersc.gov/research/ftg/mvich/index.html [5] http://www.disi.unige.it/project/gamma/ [6] http://www.disi.unige.it/project/gamma/mpigamma/ [7] http://now.cs.berkeley.edu/am/active_messages.html [8] http://ie.korea.ac.kr/~supercom/software/