untitled

Size: px

Start display at page:

Download "untitled"

なおみももき
5 years ago
Views:

1 OS 2007/4/27 1 Uni-processor system revisited Memory disk controller frame buffer network interface various devices bus 2 1

2 Uni-processor system today Intel i850 chipset block diagram Source: intel web site 3 Processing model of uni-processor systems input ouptut 4 2

3 Making systems faster: parallelism Bit-level parallelism 8 bit, 16 bit, 32 bit, 64 bit... Instruction-level parallelism Pipelining, Superscalar... Process-level / Thread-level parallelism 5 Processing model of parallel systems (1) SIMD (Single Instruction, Multiple Data) SIMD today AltiVec (PowerPC G5), SSE2 (x86) (example)

4 Processing model of parallel systems (2) MIMD (Multiple Instruction, Multiple Data) 7 MIMD systems: part I Centralized memory Symmetric Multiprocessor (SMP) Uniform Memory Access Distributed memory Non-Uniform Memory Access (NUMA) today: cc-numa 8 4

5 Dual-processor systems PowerMAC G5 Opteron 9 Dual-core systems UltraSPARC IV 10 5

6 Textbook SMP systems Cache Cache Cache Cache Memory I/O I/O system 11 SMP today Sun V880 rx

7 Programming models Multiprogramming Shared memory Message passing Data parallel 13 OS Programming models and OS multiprocessor operating systems Multiprogramming Shared memory uniprocessor operating systems + library Message passing Data parallel 14 7

8 (1) Programming model in shared memory systems: multiprogramming (process-level parallelism) applicable to both NUMA and SMP C : fork, exec, waitpid, etc. : #!/bin/bash for ((i = 0; i < 20; ++i)); do./param-search $i done (parameter-search problems) (task-parallel workloads) 15 (2) Programming model in shared memory systems: threads (thread-level parallelism) prerequisite: shared memory SMP, cc-numa 1posix threads example Threads 16 8

9 : OpenMP Thread-level parallelism by example: 2/2 OpenMP fork-join model parallel region Example F O R K J O I N Parallel region #include <omp.h> #define N 1000 main() { int i; float a[n], b[n], c[n]; //... #pragma omp parallel shared(a,b,c) private(i) { #pragma omp for schedule(dynamic,100) for (i = 0; i < N; i++) c[i] = a[i] + b[i]; } } Parallel region 17 (3) Programming model in shared memory systems: IPC : inter-process communication semaphore, shared memory, message passing,... C: sem_init, sem_wait, sem_destroy shmget, shmat, shmdt send, recv 18 9

10 Abstracting multiprocessor systems Multiprogramming, shared memory: same program; scales up to multiprocessor systems Semaphore Threads User programs Shared memory Processes Multiprocessor agnostic Pitfalls? 19 OS Operating system role in shared-memory systems / semaphore, shared memory, message passing Instruction-level parallelism optimizing compilers Thread-level parallelism OpenMP 20 10

11 MIMD systems: part II Shared memory systems Centralized memory Symmetric Multiprocessor (SMP) Uniform Memory Access Distributed memory Non-Uniform Memory Access (NUMA) today: cc-numa Thread-level parallelism with shared state Process-level parallelism with shared state Multicomputers, Clusters (Shared-nothing*) Research MPPs (dead) Computing cluster Web server cluster, etc. Process-level parallelism without shared state* 21 Computing cluster Computing node Computing node... Cluster Interconnect OS 22 11

12 vs : Hidden cost of computing clusters SMP, cc-numa Diminishing returns (IPP p.1) OS Cluster PC OS Pros Reproducible results Reusable software Bigger memory Better compiler Larger MTBF Reliable support Cons Reproducibility issues Explicit parallelism Smaller memory Average compiler Smaller MTBF Unreliable support 23 OS Programming models and OS revisited multiprocessor operating systems Multiprogramming Shared memory uniprocessor operating systems + library Message passing Data parallel 24 12

13 Message passing programming model MPI (Message Passing Interface) Send / receive Communication within groups Scatter / gather Reduce Barrier synchronization 25 Send/Receive in MPI MPI_Send MPI_Recv processes e.g., distributing a very large sparse matrix across nodes data 26 13

14 Blocking send/receive, non-blocking send/receive Blocking MPI_Send MPI_Recv Non-blocking MPI_ISend MPI_IRecv 27 Communication within groups MPI_Group_* processes e.g., running the same algorithm on two data-sets in parallel data 28 14

15 Collective communications(1): Broadcast processes MPI_Bcast data e.g., initial parameter 29 Collective communications(2): Scatter / gather processes scatter gather data MPI_Scatter MPI_Gather e.g., distribute vector across nodes matrix multiplication example (uniprocessor, MPI) 30 15

16 Global reduction operations: Reduce applies similarly to all columns processes data MPI_Reduce MPI_SUM, MPI_PROD, MPI_MAX, MPI_MIN,... e.g., vector-matrix product 31 Barrier synchronization MPI_Barrier Blocks until all group members have called MPI_Barrier p0 p1 p2 p

17 Taxonomy of: Parallelism Multi-processor systems Modern systems: Parallelism everywhere system-level: MIMD insruction-level: SIMD Parallelism-friendly programming models libraries compilers 33 17

01_OpenMP_osx.indd

01_OpenMP_osx.indd OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS