CPU, CPU, Memory-bound CPU,, Memory-bound ( ) Performance Monitoring Counter(PMC), PMC (nmi watchdog), PMC CPU., PMC, CPU, Memory-bound, CPU-bound,, CPU,, PMC,,,, CPU, NPB 8, 5% CPU, CPU, 3%, 5% CPU, IS 5%, 15% CPU, Last Level Cache Miss PMC, Evaluation of Dynamic Voltage and Frequency Scaling Adaptation based on Memory Power Masahiro Miwa, Kohta Nakashima, Akira Hirai, Satoshi Kazama, Yasushi Hara and Akira Naruse This paper presents a novel approach of DVFS adaptation based on memory power. We find that memory power strongly correlates with CPU Frequency Dependency (the degree of application performance affected by CPU frequency). CPU-bound application consumes low memory power and Memory-bound application consumes high memory power. We make a CPU frequency scaling by memory power. So, our proposal method does not need to use the Performance Monitoring Counter that most of prior works use in order to decide CPU frequency scaling. Our method uses data acquired by sensor and could be implemented on BMC(Baseboard Management Controller), and in that case, no overhead is needed in target host for CPU frequency control. Experiment results using NPB show that under 3% performance loss(average), we could save 5% CPU energy(average) and especially for IS benchmark, 15% energy saving was achieved under 5% performance loss. 1., PC, PC ( ) Fujitsu Laboratories Ltd.,, DVFS(Dynamic Voltage and Frequency Scaling) CPU CPU,, CPU, CPU [10]. DVFS, Linux Ondemand Governor [8]. Ondemand Governor, CPU 357
CPU, CPU CPU, CPU CPU CPU CPU,,, CPU, CPU,, CPU-bound Memory-bound ( ), Memory-bound CPU Memory-bound, Memory-bound CPU [1 6],, PMC(Performance MonitoringCounter) [1 5]., PMC (nmi watchdog) Vtune, CPU PMC, PMC, CPU, CPU, CPU Memory-bound,, CPU-bound,, Memory-bound, CPU,,, PMC,, CPU, CPU CPU, 1 SPEC CPU2006, CPU -0.93,, NPB 8, 5% CPU, CPU, 3%, 5% CPU, IS 5%, 15% CPU, Last Level Cache Miss PMC, 2, 3, 4, CPU 5, 6, 2. CPU,, 2.1 CPU PC, CPU idle, CPU,. 1 CPU, 1 idle, CPU 100% idle, idle CPU, CPU 358
2!"# $!"# CPU 2.93GHz 0.3417 33.36 85.67 1.60GHz 0.6265 61.19 94.98 (%) 83.35 83.42 10.87 1 (ns) CPUbound(sec) Memorybound(sec) 2 CPU CPUbound Memorybound CPU (%) 100 13 CPU, (CPU 100%), CPU CPU, CPU,., CPU PC, CPU, CPU 2.2 CPU 2.2.1 CPU 3 CPU Intel Xeon X5570 @2.93 GHz Memory DDR3 1333MHz 12GB(4GBx3) OS CentOS 5.7 CPU 2 CPU-bound, Memory-bound CPU-bound,,, Memory-bound,, Memory-bound, CPU,, Memory-bound,, CPU 2, CPUbound Memory-bound CPU 3, CPU-bound, CPU, Memory-bound, CPU CPU-bound, Memorybound, Memory-Bound,, CPU. CPU, CPU, (1), CPU, CPU, (1), 2, CPU-bound CPU 100%, Memory-bound CPU 13% ( 1, 2). (CPU ) = ( ) ( ) (1) 2.2.2 CPU CPU CPU, Memory-bound, CPU, CPU, CPU-bound CPU, CPU, CPU,,, CPU, CPU,, CPU [1] f max, CPU (T fmax cpu ) CPU (T fmaxmem ) 359
((2) ). T fmax = T fmaxcpu + T fmaxmem (2) CPU f max f (3) T f = f max T fmax cpu f + T f max (3) mem, CPU f max f, (5) T f T fmax = f max T fmax cpu f T f max (4) cpu = ( fmax 1)T fmax (5) cpu f CPU T fmax, CPU cpu f max T fmax CPU (d), (6) T f T fmax = ( f max 1)T fmax d (6) f, (P F Loss), (7) P F Loss = T f T fmax T fmax (7) (6), (7), CPU (f) (8) f max f = (8) P F Loss /d + 1, P F Loss, CPU f, CPU f max CPU 2.3 2 CPU, CPU,, PMC [1 5], PMC Last Level Cache(LLC) Miss [2, 3, 5]. LLC Miss,, LLC Miss Memory-bound, CPU-bound, PMC PMC PMC (nmi watchdog), PMC CPU CPU, PMC, PMC,, CPU CPU 3. CPU CPU Memorybound,,, CPU-bound,,, Memory-bound CPU-bound, CPU 3, SPEC CPU2006 [11] SPEC CPU2006, (INT), (FP), Intel Compiler 12, SPEC CPU2006, train, 1 4 CPU 3, CPU CPU, -0.93,, CPU,, CPU, CPU CPU, ( 3 )., CPU SPEC CPU2006, CPU, PMC PMC (nmi watchdog) 360
0., + *& -) /,'(% & $!"## 632, +- 1*+ 40 1. 52,, - 1*+) /!"#$%$&'(% 3 (p) CPU (d) 4 LLC Miss (c) CPU (d),, BMC(Baseboard Management Controller) CPU,, CPU 4., CPU CPU,, NPB 4.1 4.1.1 CPU 3, CPU, CPU (8), CPU 4.1.2 CPU CPU, 1 100, CPU, CPU, 133MHz 1.60 2.93GHz 11, CPU, (8) CPU f, CPU 4.2 4.2.1 5(%), CPU, NAS Parallel Benchmark (NPB)(version 3.2, OpenMP ) [12]. NPB 8, C, 4, Intel Compiler 12 4.2.2 PMC PMC, LLC Miss [2, 3, 5]., Nehalem Uncore PMC, UNC L3 MISS.ANY(EventNum 09H, Umask Value: 03H) [9]., PMC CPU,, SPEC CPU2006 train 4 LLC Miss, CPU LLC Miss, CPU LLC Miss CPU -0.95 LLC Miss, ( 4 ), CPU,, CPU 4.3 3 3,,, 3 DIMM VTT. 9 361
!" #$ %&'(" %& $" &)!" #$ %&'(" %& $" &) 5 CPU (1sec) 6 CPU (100msec)! #$! %&! '(! " " )*+,&! )*! " (&! *-! "!" #$ %& '( )$ '( &$ (* 7 PMC CPU (1sec) 8 PMC CPU (100msec), NI USB-6210(National Instrument ), nidaqmxbase-3.4.5 10msec,,, CPU 9 4.4 (2.93GHz) CPU 5, 6, 7, 8, CPU CPU, ( ),, CPU, PMC CPU 5% PMC, CPU, CPU 1, 100 CPU, 5%CPU, IS 5%, 15% CPU LLC Miss PMC CPU, CPU, PMC CPU, IS, CPU, 362
4 ( CPU ) memory 1sec memory 100msec (%) (%) ( - ) (%) (%) ( - ) bt.c 3.8 3.5 0.3 4.4 5.1-0.6 cg.c 3.3 1.6 1.7 3.3 1.4 1.9 ep.c 4.8 4.9-0.1 4.7 4.9-0.2 is.c 2.0 4.9-2.9 1.9 4.8-2.9 lu-hp.c 4.3 1.6 2.7 3.8 3.0 0.7 lu.c 3.9 1.4 2.5 4.0 2.4 1.6 sp.c 4.1 3.1 0.9 3.9 5.4-1.5 ua.c 2.7 2.5 0.2 3.7 3.1 0.6 5 (PMC CPU ) pmc 1sec pmc 100msec (%) (%) ( - ) (%) (%) ( - ) bt.c 3.8 3.5 0.3 4.3 4.2 0.1 cg.c 4.8 1.4 3.4 4.9 1.2 3.7 ep.c 4.7 4.9-0.2 4.5 6.7-2.2 is.c 4.7 1.6 3.2 4.6 1.6 3.0 lu-hp.c 2.5 1.8 0.8 4.1 2.8 1.3 lu.c 4.1 1.7 2.4 4.3 1.5 2.8 sp.c 3.7 2.7 1.0 4.0 3.3 0.6 ua.c 2.8 2.7 0.0 4.1 4.1 0.0 4.5 5%,, 2 ( 1 ) CPU CPU, (133MHz), (8) CPU, CPU, (8) CPU CPU, CPU, CPU, ( 2 ) CPU 3 4,. CPU (CPU-bound), CPU, CPU (Memory-bound),,, CPU,, CPU ( ) ( ), 4, 5,, CPU, PMC CPU,,,, 5. DVFS Ondemand Governor [8], CPU CPU CPU,,, CPU, CPU CPU, [2], PMC LLC Miss, CPU, PMC, [1, 4]., PMC (nmi watchdog), Vtune, PMC CPU 363
,,, CPU, [6].,,,,, [3] PMC, MSHR (Miss Status Holding Register), MSHR,,,,, CPU [7],, PMC CPU, CPU, CPU,,, 6., CPU, CPU,, CPU, CPU CPU, CPU PMC PMC,, NPB 8, 5% CPU, CPU, 3%, 5% CPU, IS 5%, 15% CPU, Last Level Cache Miss PMC,, 1) S. Huang and W. Feng, Energy-Efficient Cluster Computing via Accurate Workload Characterization, in Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, ser. CCGRID 09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 68-75. 2) R. Schone, D. Hackenberg: On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In Proceeding of the second joint WOSP/SIPEW international conference on Performance engineering, ICPE 11, pages 481-486. ACM, 2011. 3),,,, Vol. 45, No. SIG 6(ACS 6), pp.1-11, 2004 5. 4) R. Kotla, S. Ghiasi, T. Keller and F. Rawson, Scheduling Processor Voltage and Frequency in Server and Cluster Systems, Proceedings of the 19th IEEE International Parallel and Distributed Symposium, April 2005, pp 234-241. 5) Hrishikesh Amur, Karsten Schwan, Milos Prvulovic, Towards Optimal Power Management: Estimation of Performance Degradation due to DVFS on Modern Processors, Tech. Report GIT-CERCS-10-02. 6) P. Stanley-Marbell, M. Hsiao and U.Kremer, A Hardware Architecture for Dynamic Performance and Energy Adaptation, Power-Aware Computer Systems, Lecture Notes in Computer Science 2325, Springer Verlag, 2002. 7) Vol47 No.Sig18 (ACS16) pp.80 91 2006 11. 8) V. Pallipadi, A. Starikovskiy, The Ondemand Governor: Past, Present, and Future, The Linux Symposium, 2006. 9) Intel, Intel 64 and IA-32 Architectures Software Developer s Manual Volume 3B: System Programming Guide, Part2, 2010. 10) Intel White Paper, Enhanced Intel Speed- 364
Step Technology for the Intel Pentium M Processor, 2004. 11) SPEC CPU2006, http://www.spec.org/cpu2006/. 12) Nas Parallel Benchmark, http://www.nas.nasa.gov/. 365