,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

Similar documents
IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

いて, サンプルとして詳細に実行するイタレーション数を 計算する. シミュレーション時には, シミュレーション精 度の異なる形式を実行時に切り替えることにより, 並列化 アプリケーションにおけるサンプリング対象のイタレーシ ョンに対応する部分をサンプルサイズ分だけ詳細なシミュ レーションを行い, 残

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

Tilera 1) 64 TILEP64 2) TILEP64 TILE64 H.264 3) Motion JPEG Decoder 4) OSCAR ) OSCAR 6) 7)8) OSCAR API 9) 10) OS- CAR API OSCAR OpticalFlow, JPEG XR 1

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

IPSJ SIG Technical Report Vol.2015-ARC-215 No.7 Vol.2015-OS-133 No /5/26 Just-In-Time PG 1,a) 1, Just-In-Time VM Geyser Dalvik VM Caffei

16.16%

Chip Size and Performance Evaluations of Shared Cache for On-chip Multiprocessor Takahiro SASAKI, Tomohiro INOUE, Nobuhiko OMORI, Tetsuo HIRONAKA, Han

4.1 % 7.5 %

B

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

GPGPU

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

P2P P2P peer peer P2P peer P2P peer P2P i

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

IPSJ SIG Technical Report Vol.2015-HPC-150 No /8/6 I/O Jianwei Liao 1 Gerofi Balazs 1 1 Guo-Yuan Lien Prototyping F

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

Run-Based Trieから構成される 決定木の枝刈り法

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT MOP

IPSJ SIG Technical Report Vol.2016-ARC-221 No /8/9 GC 1 1 GC GC GC GC DalvikVM GC 12.4% 5.7% 1. Garbage Collection: GC GC Java GC GC GC GC Dalv

IPSJ SIG Technical Report Vol.2009-DPS-141 No.23 Vol.2009-GN-73 No.23 Vol.2009-EIP-46 No /11/27 t-room t-room 2 Development of

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

LAN LAN LAN LAN LAN LAN,, i

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

IHI Robust Path Planning against Position Error for UGVs in Rough Terrain Yuki DOI, Yonghoon JI, Yusuke TAMURA(University of Tokyo), Yuki IKEDA, Atsus

Nexus7 2 Skia 3 4 skia 5 2. Skia 2D Android 2D Skia 2.1 Skia Skia 2D Skia Google Chrome Mozilla Firefox Android Chorome OS Android 2D Skia [7]. Androi

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1重谷.PDF

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

1

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

先進的計算基盤システムシンポジウム SACSIS2012 Symposium on Advanced Computing Systems and Infrastructures SACSIS /5/18 CPU, CPU., Memory-bound CPU,., Memory-bo

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

IPSJ SIG Technical Report Vol.2015-ARC-215 No.13 Vol.2015-OS-133 No /5/ ,a) % 13.9% 1. Transactional Memory: TM [1] TM TM 1 Nag

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

900 GPS GPS DGPS Differential GPS RTK-GPS Real Time Kinematic GPS 2) DGPS RTK-GPS GPS GPS Wi-Fi 3) RFID 4) M-CubITS 5) Wi-Fi PSP PlayStation Portable

DRAM L2 L2 DRAM L2 DRAM L2 RAM DRAM 3 DRAM 3. 1 DRAM SRAM/DRAM 2. SRAM/DRAM DRAM LLC Last Level Cache 2 2) DRAM 1(A) (B) LLC L2 DRAM DRAM L2 SRAM DRAM

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

M SRAM 1 25 ns ,000 DRAM ns ms 5,000,

7,, i

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

1_26.dvi

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

Fig. 1 Schematic construction of a PWS vehicle Fig. 2 Main power circuit of an inverter system for two motors drive

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

1. (FFT) LTE 1) FFT, DSP. FFT Airoldi, R 9 NoC(Network on Chip) 64 FFT 2) Long Chen IBM Cyclops-64 FFT 3) FFT FFT Franchetti, F DFT FFT 4)5) FFT FFT F

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

DS0 0/9/ a b c d u t (a) (b) (c) (d) [].,., Del Barrio [], Pilato [], [].,,. [],.,.,,.,.,,.,, 0%,..,,, 0,.,.,. (variable-latency unit)., (a) ( DFG ).,

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

HTM RaR HTM 2. 2) 3) HTM 2 3 Yoo 4) HTM Adaptive Transaction Scheduling Akpinar 5) HTM Gaona 6) HTM 3. Read-after-Read HTM 3.1 Read-after-Read Read Wr

知能と情報, Vol.30, No.5, pp

2 ( ) i

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

Duplicate Near Duplicate Intact Partial Copy Original Image Near Partial Copy Near Partial Copy with a background (a) (b) 2 1 [6] SIFT SIFT SIF

IPSJ SIG Technical Report Vol.2009-CVIM-167 No /6/10 Real AdaBoost HOG 1 1 1, 2 1 Real AdaBoost HOG HOG Real AdaBoost HOG A Method for Reducing

mobicom.dvi


( 1) 3. Hilliges 1 Fig. 1 Overview image of the system 3) PhotoTOC 5) 1993 DigitalDesk 7) DigitalDesk Koike 2) Microsoft J.Kim 4). 2 c 2010

MAC root Linux 1 OS Linux 2.6 Linux Security Modules LSM [1] Security-Enhanced Linux SELinux [2] AppArmor[3] OS OS OS LSM LSM Performance Monitor LSMP

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

XACCの概要

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth

IPSJ SIG Technical Report Vol.2015-MUS-107 No /5/23 HARK-Binaural Raspberry Pi 2 1,a) ( ) HARK 2 HARK-Binaural A/D Raspberry Pi 2 1.

IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara

B HNS 7)8) HNS ( ( ) 7)8) (SOA) HNS HNS 4) HNS ( ) ( ) 1 TV power, channel, volume power true( ON) false( OFF) boolean channel volume int

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

2003/3 Vol. J86 D II No Fig. 1 An exterior view of eye scanner. CCD [7] CCD PC USB PC PC USB RS-232C PC


206“ƒŁ\”ƒ-fl_“H„¤‰ZŁñ

HP cafe HP of A A B of C C Map on N th Floor coupon A cafe coupon B Poster A Poster A Poster B Poster B Case 1 Show HP of each company on a user scree

13金子敬一.indd

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

27 YouTube YouTube UGC User Generated Content CDN Content Delivery Networks LRU Least Recently Used UGC YouTube CGM Consumer Generated Media CGM CGM U

[1] [3]. SQL SELECT GENERATE< media >< T F E > GENERATE. < media > HTML PDF < T F E > Target Form Expression ( ), 3.. (,). : Name, Tel name tel

013858,繊維学会誌ファイバー1月/報文-02-古金谷

(1 ) (2 ) Table 1. Details of each bar group sheared simultaneously (major shearing unit). 208

25 About what prevent spoofing of misusing a session information

<31322D899C8CA982D982A95F985F95B65F2E696E6464>

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

Transcription:

1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1 HIRONORI KASAHARA 1 A parallelizing compiler cooperative multicore architecture simulation framework, which enables reducing simulation time by a flexible simulation-mode changeover mechanism, is proposed. A multicore architecture simulator in this framework has two modes; namely, functional-and-fast simulation mode and cycle-accurate-and-slow simulation modes. This framework generates appropriate sampling points for cycle-accurate mode and runtime for mode changeover of the simulator depending on a parallelized application by cooperating with a parallelizing compiler. The proposed framework is evaluated with EQUAKE from SPEC2000. The evaluation result shows 50 times to 500 times speedup can be achieved within 1.6% error. 1. 500010000 SimFlex 1) SimPoint 2) 3,4) 1 2 3 4 5 1 WASEDA UNIVERSITY 1

2. 2.1 3,4) 1 P% P%P=2.5 5%!%! (1) = (2) 2.2 2 l l 100130 2.3 1 1 Figure 1 A compilation flow of the proposing sampling based architecture simulation FE MP BE OSCAR 5) 1 2

3) 3. 2 3.1 2 2 3.2 3 OSCAR 6) sim_count 12 238 3 sim_change sim_change PE int sim_count[] = {122383 */ MAIN_PE0{ /*PE0 */ /* for(){ /**/ sim_change(0sim_count); */ /* MAIN_PE1{ /*PE1 */ for(){ sim_change(1sim_count); Figure 2 2 An image of simulation-mode changeover 3 Figure3 An image of code for changeover of simulation modes 3

4. 4.1 4.1.1 4.1.2 2 L2 L1 cache size L2 cache size 32kB 64kB256kB512kB L1 4 L2 5 4 5 20.0% 15.0% 10.0% 3 SPARC V9 8 L1 cache latency 1 L2 cache latency 4 memory latency 60 5.0% 0.0% L2 L1 Cache L2 Cache L1 Cache L2 Cache 32kB 4 L1 16kB Figure 4 L1 Cache-miss rate with and without runtime overhead 1 L1 L1 cache size 32kB16kB L2 cache size 512kB 4

16.0% 15.0% 14.0% 13.0% 12.0% 11.0% 10.0% 9.0% 8.0% 7.0% 6.0% 5.0% 5 L1 Cache L2 Cache L1 Cache L2 Cache L1 Cache L2 Cache 512kB 256kB 64kB L2 Figure 5 L2 Cache-miss rate with and without runtime overhead 4.2 4 Intel Xeon E5506 CPU Xeon CPU 8 CPU Clock 283GHz L1 Cache(I/D) 32KB/32KB L2 Cache 60MB Main Memory 78GB 7 250 250 125 250 125 250 (1) 5 6.00E+08 5.00E+08 4.00E+08 3.00E+08 2.00E+08 1.00E+08 0.00E+00 0 1000 2000 3000 4000 Figure 7 7 Execution cost of each iteration in a main loop of EQUAKE on a real server 6 EQUAKE Figure 6 Program structure of EQUAKE 5 EQUAKE 250 250 172E+07 349E+08 4 250 3605 722E+05 319E+08 1 5

3.0E+11 2.5E+11 2.0E+11 1.5E+11 1.0E+11 5.0E+10 0.0E+00 4 152545 all 4 152545 all 4 152545 all 4 152545 all 1PE 2PE 4PE 8PE 1.60% 1.40% 1.20% 1.00% 0.80% 0.60% 0.40% 0.20% 0.00% = 100 (3) 8 250 Figure 8 The number of presumed execution cycles and error rate of a portion before 250 iterations 6 2.5E+12 0.35% SPARC V9 1248 L1 cache size 32kB L1 cache latency 1 L2 cache size 512kB 2E+12 1.5E+12 1E+12 5E+11 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% L2 cache latency 4 memory latency 60 L2 0 1 5 30 50 all 1 5 30 50 all 1 5 30 50 all 1 5 30 50 all 1PE 2PE 4PE 8PE 0.00% 9 250 Figure 9 The number of presumed execution cycles and error rate of a portion after 250 iterations 10 4 54 15 16 25 10 45 5 11 1 558 5 345 30 102 50 65 6

60 250 B23700064 50 40 30 20 10 0 4 15 25 45 all 10 250 Figure 10 The speedup rate of a portion before 250 iterations 250 600 500 400 300 200 100 0 1 5 30 50 all 1) Thomas F. Wenishch, Roland E. Wunderlich, Michael Ferdman, Anastassia Ailamaki, Bavak Falsafi, and James C. Hoe, Sim-Flex: Statistical Sampling of Computer System Simulation Micro IEEE, Volume 26, Issue 4, pp.32-42, July-Aug, 2006 2) Erez PerelmanGreg HamerlyMichael Van Biesbrouck Timothy SherwoodBrad Calder Using SimPoint for Accurate and Efficient Simulation SIGMETRICS 03, San Diego, California, USA. ACM 1-58113-664-1/03/0006, June 10 14, 2003 3). 2011-ARC-196(14), 1-11, 2011-07-20 4) 191 Vol. 2012-ARC-199, No.3, 2011-07-20 5) Hironori Kasahara, Motoki Obata, Kazuhisa Ishizaka, Automatic Coarse Grain Task Parallel Processing on SMP using OpenMP, Proc. of 13 th International Workshop on Languages and Compilers for Parallel Computing (LCPC 00), Aug., 2000 6) Keiji kimura, Masayoshi Mase, Hiroki Mikami, Takamichi Miyamoto, Jun Shirako and Hironori Kasahara, OSCAR API for Real-time Low-Power Multicores and Its Performance on Multicores and SMP Servers, Lecture Note in Computer Science, Springer, Vol.5898, pp.188-202, 2010 Figure 11 11 250 The speedup rate of a portion after 250 iterations 5. 7