IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

Similar documents
1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

GPGPU

07-二村幸孝・出口大輔.indd

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

main.dvi

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

MDD PBL ET 9) 2) ET ET 2.2 2), 1 2 5) MDD PBL PBL MDD MDD MDD 10) MDD Executable UML 11) Executable UML MDD Executable UML

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

PowerPoint プレゼンテーション

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/7 1,a) 2,3 2,3 3 Development of the ethological recording application for the understanding of

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP

2017 (413812)

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT MOP

Vol. 48 No. 3 Mar PM PM PMBOK PM PM PM PM PM A Proposal and Its Demonstration of Developing System for Project Managers through University-Indus

1_26.dvi

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1 4 4 [3] SNS 5 SNS , ,000 [2] c 2013 Information Processing Society of Japan

Vol.57 No (Mar. 2016) 1,a) , L3 CG VDI VDI A Migration to a Cloud-based Information Infrastructure to Support

PeerPool IP NAT IP UPnP 2) Bonjour 3) PeerPool CPU 4) 2 UPnP Bonjour PeerPool CPU PeerPool PeerPool PPv2 PPv2 2. PeerPool 2.1 PeerPool PeerPool PoolGW

組込みシステムシンポジウム2011 Embedded Systems Symposium 2011 ESS /10/20 FPGA Android Android Java FPGA Java FPGA Dalvik VM Intel Atom FPGA PCI Express DM

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

_先端融合開発専攻_観音0314PDF用

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

fiš„v8.dvi

& Vol.2 No (Mar. 2012) 1,a) , Bluetooth A Health Management Service by Cell Phones and Its Us

IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

202

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

4) 5) ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) )8) ( 1 ) ( 2 ) ( 3 ) ( 200 9) ( 10) 1 2 (

HPC pdf

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

DEIM Forum 2009 B4-6, Str

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of

EGunGPU

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

Web Web Web Web Web, i

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

13金子敬一.indd

P2P P2P peer peer P2P peer P2P peer P2P i

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

24 LED A visual programming environment for art work using a LED matrix

GPU n Graphics Processing Unit CG CAD

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

2. Twitter Twitter 2.1 Twitter Twitter( ) Twitter Twitter ( 1 ) RT ReTweet RT ReTweet RT ( 2 ) URL Twitter Twitter 140 URL URL URL 140 URL URL

橡最終原稿.PDF

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

デジタルカメラ用ISP:Milbeaut

4.1 % 7.5 %

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

1 StarCraft esportsleague WallPlayed.org 200 StarCraft Benzene StarCraft 3 Terran Zerg Protoss Terran Terran Terran 3 Terran Zerg Zerg Worker D

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

206“ƒŁ\”ƒ-fl_“H„¤‰ZŁñ

( )


表紙1

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig


/toushin/.htm GP GP GP GP GP p.

A pp CALL College Life CD-ROM Development of CD-ROM English Teaching Materials, College Life Series, for Improving English Communica

I 1) 2) 51 (1976) 6.9 ha 9 (1934) 2km 15, (1955) 6 (1620)

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

1 UD Fig. 1 Concept of UD tourist information system. 1 ()KDDI UD 7) ) UD c 2010 Information Processing S

Microsoft PowerPoint - GPU_computing_2013_01.pptx

strtok-count.eps

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

6_27.dvi

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

Fig. 1 Relative delay coding.

Transcription:

SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani 1 Naoki Nishiyama 1 Ittetsu Taniguchi 1 Hiroyuki Tomiyama 1 Nguyen Truong Son 2 Masahiko Kondo 2 Takeshi Soga 3 Tomoya Hirao 3 Koji Inoue 3 Abstract: Many-core architecture can achieve highly parallel processing performance, which is derived from tens or hundreds of low performance, small area, and low power cores to work in parallel. This advantage makes many-core a practical choice for embedded systems not only general computing systems. In a program of Extremely Low-power Circuits and Systems (Green IT Project) sponsored by New Energy and Industrial Technology Development Organization (NEDO), an environment using FPGA in order to evaluate SMYLEref architecture for many-core processor was developed as result of the research by a project of many-core architecture for low energy consumption and its compiler technology. This paper describes SMYLE OpenCL, an OpenCL implementation for SMYLEref many-core architecture on the evaluation environment. In experiments, a number of benchmark programs are executed on SMYLEref architecture with 128 cores based on the FPGA evaluation environment to verify effectiveness of SMYLE OpenCL. 1. GPU (Graphics Processing Unit) GPGPU (General-purpose 1 College of Science and Engineering, Ritsumeikan University 2 Graduate School of Information Systems, The University of Electro-Communications 3 Department of Advanced Information Technology, Kyushu University computing on GPUs) 1

(NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 2 128 2 3 SMYLE 4 5 2. GPU GPGPU [1] CUDA GPGPU [2] GPU OpenCL Khronos [3] OpenCL OpenCL OpenCL (PE) HPC (High Performance Computing) CUDA nvidia GPU OpenCL CUDA OpenCL Khronos OpenCL OpenCL nvidia ATI GPU OpenCL Intel Core OpenCL GPU PE PE 1 PE [4] Intel Core OpenCL Core OpenCL SMYLE OpenCL Linux [5] 1 1 OpenCL OpenCL OpenCL C PE SMYLE OpenCL [6] 3. SMYLE OpenCL SMYLEref FPGA 128 SMYLE OpenCL SMYLEref SMYLE OpenCL FPGA 128 3.1 SMYLE OpenCL 1 1 OS OpenCL API PE 2

1 Fig. 1 Many-core Architecture Model 1 PE PE OpenCL OpenCL 3.2 SMYLEref SMYLEref NoC (Network on Chip) 2 1 SMYLEref 2 2 1 8 MIPS R3000 geyser [7], [8] geyser 8KB L1 L1 16 TLB L2 [8] 2 SMYLEref Fig. 2 Overview of the SMYLEref Architecture 3 SMYLE OpenCL Fig. 3 Implementation of SMYLE OpenCL OpenCL API PE SMYLE OpenCL PE OpenCL OpenCL 3.3 SMYLE OpenCL SMYLE OpenCL 1 OpenCL SMYLEref OpenCL SMYLE OpenCL 3 SMYLE OpenCL geyser Linux Linux OpenCL [9] SMYLE OpenCL 3.4 FPGA 128 128 FPGA 1 128 Xilinx FPGA Virtex-6 ML605 16 1 geyser Linux OS 128 1 geyser 3

Table 1 1 FPGA SDRAM IO Fig. 4 ML605 Specifications of ML605 Evaluation Board 2 Table 2 Virtex-6 XC6VLX240T-1FFG1156 DDR3 SODIMM(512MB) UART USB DVI CF SMA 200MHz & 66MHz Virtex-6 Specifications of Virtex-6 65nm CMOS, 1.0V Logic Cells 241,152 CLB Slices 37,680 Block RAM I/O 720 4 14,975 Kbit 128 Appearance of the 128 Cores Environment OpenCL 127 ML605 Virtex-6 1, 2 128 FPGA Virtex-6 8 geyser FPGA 1 1 16 FPGA 2 4 128 16 FPGA 4. SMYLE OpenCL OpenCL 1 127 128 FPGA [8] Geyser : 10MHz (PLB): 5MHz DDR3-SDRAM: 100MHz OpenCL 6 backprojection : blackscholes : gaussian : grayscale : linearsearch : runlength : backprojection 2 blackscholes gaussian grayscale PPM linearsearch 1 0 1 runlength gettimeofday() input data : preparation : OpenCL run : 4

5 backprojection Fig. 5 Execution Time of backprojection 7 gaussian Fig. 7 Execution Time of gaussian 6 blackscholes Fig. 6 Execution Time of blackscholes 8 grayscale Fig. 8 Execution Time of grayscale output data : release : linearsearch init 4.1 5 10 backprojection blackscholes gaussian grayscale 64 127 runlength 127 550Byte linearsearch 1 2 4 9 linearsearch Fig. 9 Execution Time of linearsearch 8 10 runlength SMYLE OpenCL 5

10 runlength Fig. 10 Execution Time of runlength [6] SMYLE OpenCL Vol. 2012-EMB-27, No. 7, pp. 1 8 (2012). [7] MIPS Geyser FPGA Linux Vol. 2010-ARC-189, No. 9, pp. 1 8 (2010). [8] FPGA SMYLEref Vol. 2012-ARC-198, No. 15, pp. 1 7 (2012). [9] SoC OpenCL DA 2012 pp. 73 78 (2012). 5. SMYLEref OpenCL SMYLEref FPGA 128 SMYLE OpenCL SMYLEref (NEDO) [1] Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A. and Purcell, T. J.: A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum, Vol. 26, No. 1, pp. 80 113 (2007). [2] NVIDIA Corporation: NVIDIA CUDA C Programming Guide, version 4.0, available from http://developer.download.nvidia.com/compute/cuda/ 4 0/toolkit/docs/CUDA C Programming Guide.pdf (2011). [3] Khronos OpenCL Working Group: The OpenCL Specification Version 1.1, available from http://www.khronos.org/registry/cl/specs/opencl- 1.1.pdf (2011). [4] Lindholm, E., Nickolls, J., Oberman, S. and Montrym, J.: NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, Vol. 28, pp. 39 55 (2008). [5] OpenCL Vol. 2012-SLDM-155, No. 2, pp. 1 6 (2012). 6