21 20 20413525 22 2 4
i 1 1 2 4 2.1.................................. 4 2.1.1 LinuxOS....................... 7 2.1.2....................... 10 2.2........................ 15 3 17 3.1................................. 17 3.2.................. 20 3.2.1....................... 22 3.2.2........................ 24 3.2.3............ 25 3.2.4............... 25 4 27 4.1.................................... 27 4.2................................. 28 4.2.1................... 30 4.2.2 32 4.2.3................... 32 5 34
ii 6 36 37 A 39 A.1 cgroup.................... 39 A.2.......................... 40
1 1 ( PC) 64 Intel AMD 64 48 256TB PC GB GB ( OS) ( HDD) HDD
2 1 MPI[1](Message Passing Interface) [2][3] 8 16 PC ( ) [4][5][6][7][8] OS 1 Common Universal
3 NAND NAND PC SSD(Solid State Drive) [9] ( ) ( ) ( ) 1 PC SSD HDD SSD HDD LinuxOS 2 3 4 OS 5
4 2 OS Linux OS PC 2.1 2 1 (MMU) MMU CPU MMU TLB(Translation Look-aside Buffer)
5 2 1 MMU MMU
6 ( ) 2 2
7 PC 2 2 ( ) ( ) 2.1.1 LinuxOS Linux (4KB: ) ( kswapd ) LRU Linux ( 2 3)
8 2 3 OS ( 2 4) HDD 2 5 page-cluster 2 page cluster 1
9 2 4 OS Linux 1 512 2048 HDD CD-ROM (FDD)
10 2 5 OS (page-cluster=2 ) 2.1.2 LinuxOS N x N O(n 3 )
11 N x N #define N 100 Iterate i, j, k; Array A[N][N], B[N][N], C[N][N]; for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { tmp = 0.0; for (k = 0; k < N; k++) { tmp += A(i, k) * B(k, j); } C(i, j) = tmp; } } # # # N x N # # # k for O(n) tmp+=a(i,k)*b(k,j); O(1) O(n 1)=O(n) j for O(n) 1 O(1)+O(n) = O(n) O(n n) = O(n 2 ) i for O(n) 1 O(n 2 ) O(n n 2 ) = O(n 3 ) n O(n 3 ) k for A B xy 2 6 A x B y x y xy y 2 7
12 2 6 x y 2 8 1 4
13 2 7 xy N Swap Datasize Pagesize Size of Structure Swap = Datasize/(P agesize/sizeof Structure)
14 double 1 8 1 512 512 y OS y 2 8
15 2.2 ( ) 2 9 OS ( ) OS Teramem[4] OS Myrinet[10] InfiniBand[11] Linux Nswap[5] Nswap
16 2 9
17 3 3.1 3 1 1 2 3 2
18 3 1
19 3 2
20 3.2 ( 3 3) 3 3
21 ( 3 4) ( 3 5) 3 1 v 3 1 No. vflag page frame 0 1-1 0 f1 2 0 f2......... 3 2 No. vflag pageframe
22 3 4 3.2.1 3 3
23 3 5 3 6 vflag vflag = 0 vflag=1
24 3.2.2
25 3.2.3 3.2.4 FIFO(FirstInFirstOut)
26 3 6
27 4 LinuxOS 4.1 4 1 LinuxOS HDD 4GB HDD Linux bonnie++ / LinuxOS Linux kernel cgroup A.1 cgroup 8MB 4KB (flush ) A.2
28 4 1 CPU AMD Athron 64 x2 Dual Core Processor 4600+ 2 1GHz L1 128KB L2 512KB DDR2-SDRAM 2GB OS Ubuntu8.10 Linux Kernel 2.6.27 8GB 7200rpm 320GB SATA 3.0Gb/s 16MB 72 MB/s 52 MB/s ext3 4.2 LinuxOS double N x N 10 10 8MB A B N 4 1
29 4 1 LinuxOS cgroup 8MB double C gettimeofday()
30 4 2 4 2 4 2 4 2 4 2 [ ] [s] LinuxOS LinuxOS 832 x 832 4 2 4 2 LinuxOS [s] [s] [s] 512*512 7.71 13.02 1.79 768*768 26.01 46.77 3.64 832*832 784.03 112.26 4.19 896*896 950.92 142.10 4.97 960*960 990.08 178.46 5.16 1024*1024 1129.47 236.20 6.06 4.2.1?? 768 x 768 OS 832 x 832 LinuxOS LinuxOS
31 4 2 LinuxOS 768 x 768 832 x 832
32 4.2.2 4 2 4 2 1 4.2.3 4 2 4 2 LinuxOS ( ) ( + ) 832 x 832 LinuxOS Linux OS 2 2.1.1 LRU 2 LinuxOS /proc/sys/vm/pagecluster 3 2 3 = 8 8
33 LinuxOS 2 LinuxOS
34 5 PC LinuxOS 1 LinuxOS
35 SSD SSD
36 6
37 [1] MPI Documents, http://www.mpi-forum.org/docs/. [2],, SACSIS, 2004. [3],,,, SACSIS, 2003. [4], Teramem, SACSIS, 07 2009. [5] Tia Newhall Nswap:a network swapping module for linux clusters, Euro-Par Parallel Processing, 2003. [6],,, 10GbEthernet RDMA, CPSY Vol.106 No.287,200. [7] S.Liang, R.Noronha, D.K.Panda Swapping tp Remote Memory over InfiniBand : An Approach using a High Performance Network Block Device, IEEE Cluster Computing, 2005. [8] Pavel Mache: Linux Network Block Device, http://nbd.sourceforge.net/, 1997.
38 [9] SSD,, 09 2008. [10] Myri-10G Overview, http://www.myri.com/myri-10g/overview/. [11] Infiniband trade association, http://www.infinibandta.org/. [12],, DLM 10Gb Ethernet,, 12 2008. [13] LinuxKernelDocumentation::cgroup.txt, http://www.mjmwired.net/kernel/documentation/cgroups.txt, 10 2008.
39 A A.1 cgroup LinuxOS LinuxKernel 2.6.27 cgroup [13] test $ su root $ mount -t cgroup none test -o memory $ cd test $ mkdir group $ cd group $ echo 4M >memory.limit in bytes $./program $ echo [PID] >tasks mount root test cgroup cgroup memory cgroup (none) test test
40 ( group) cgroup test echo memory.limit in bytes ID(PID) tasks ( PID tasks PID PID ) A.2 LinuxOS /proc/sys/vm/drop caches inode /proc/sys/vm/drop caches 3 inode sync 2 $ sync $ echo "3" > /proc/sys/vm/drop_caches $ sync 1 2
41 3