SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani 1 Naoki Nishiyama 1 Ittetsu Taniguchi 1 Hiroyuki Tomiyama 1 Nguyen Truong Son 2 Masahiko Kondo 2 Takeshi Soga 3 Tomoya Hirao 3 Koji Inoue 3 Abstract: Many-core architecture can achieve highly parallel processing performance, which is derived from tens or hundreds of low performance, small area, and low power cores to work in parallel. This advantage makes many-core a practical choice for embedded systems not only general computing systems. In a program of Extremely Low-power Circuits and Systems (Green IT Project) sponsored by New Energy and Industrial Technology Development Organization (NEDO), an environment using FPGA in order to evaluate SMYLEref architecture for many-core processor was developed as result of the research by a project of many-core architecture for low energy consumption and its compiler technology. This paper describes SMYLE OpenCL, an OpenCL implementation for SMYLEref many-core architecture on the evaluation environment. In experiments, a number of benchmark programs are executed on SMYLEref architecture with 128 cores based on the FPGA evaluation environment to verify effectiveness of SMYLE OpenCL. 1. GPU (Graphics Processing Unit) GPGPU (General-purpose 1 College of Science and Engineering, Ritsumeikan University 2 Graduate School of Information Systems, The University of Electro-Communications 3 Department of Advanced Information Technology, Kyushu University computing on GPUs) 1
(NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 2 128 2 3 SMYLE 4 5 2. GPU GPGPU [1] CUDA GPGPU [2] GPU OpenCL Khronos [3] OpenCL OpenCL OpenCL (PE) HPC (High Performance Computing) CUDA nvidia GPU OpenCL CUDA OpenCL Khronos OpenCL OpenCL nvidia ATI GPU OpenCL Intel Core OpenCL GPU PE PE 1 PE [4] Intel Core OpenCL Core OpenCL SMYLE OpenCL Linux [5] 1 1 OpenCL OpenCL OpenCL C PE SMYLE OpenCL [6] 3. SMYLE OpenCL SMYLEref FPGA 128 SMYLE OpenCL SMYLEref SMYLE OpenCL FPGA 128 3.1 SMYLE OpenCL 1 1 OS OpenCL API PE 2
1 Fig. 1 Many-core Architecture Model 1 PE PE OpenCL OpenCL 3.2 SMYLEref SMYLEref NoC (Network on Chip) 2 1 SMYLEref 2 2 1 8 MIPS R3000 geyser [7], [8] geyser 8KB L1 L1 16 TLB L2 [8] 2 SMYLEref Fig. 2 Overview of the SMYLEref Architecture 3 SMYLE OpenCL Fig. 3 Implementation of SMYLE OpenCL OpenCL API PE SMYLE OpenCL PE OpenCL OpenCL 3.3 SMYLE OpenCL SMYLE OpenCL 1 OpenCL SMYLEref OpenCL SMYLE OpenCL 3 SMYLE OpenCL geyser Linux Linux OpenCL [9] SMYLE OpenCL 3.4 FPGA 128 128 FPGA 1 128 Xilinx FPGA Virtex-6 ML605 16 1 geyser Linux OS 128 1 geyser 3
Table 1 1 FPGA SDRAM IO Fig. 4 ML605 Specifications of ML605 Evaluation Board 2 Table 2 Virtex-6 XC6VLX240T-1FFG1156 DDR3 SODIMM(512MB) UART USB DVI CF SMA 200MHz & 66MHz Virtex-6 Specifications of Virtex-6 65nm CMOS, 1.0V Logic Cells 241,152 CLB Slices 37,680 Block RAM I/O 720 4 14,975 Kbit 128 Appearance of the 128 Cores Environment OpenCL 127 ML605 Virtex-6 1, 2 128 FPGA Virtex-6 8 geyser FPGA 1 1 16 FPGA 2 4 128 16 FPGA 4. SMYLE OpenCL OpenCL 1 127 128 FPGA [8] Geyser : 10MHz (PLB): 5MHz DDR3-SDRAM: 100MHz OpenCL 6 backprojection : blackscholes : gaussian : grayscale : linearsearch : runlength : backprojection 2 blackscholes gaussian grayscale PPM linearsearch 1 0 1 runlength gettimeofday() input data : preparation : OpenCL run : 4
5 backprojection Fig. 5 Execution Time of backprojection 7 gaussian Fig. 7 Execution Time of gaussian 6 blackscholes Fig. 6 Execution Time of blackscholes 8 grayscale Fig. 8 Execution Time of grayscale output data : release : linearsearch init 4.1 5 10 backprojection blackscholes gaussian grayscale 64 127 runlength 127 550Byte linearsearch 1 2 4 9 linearsearch Fig. 9 Execution Time of linearsearch 8 10 runlength SMYLE OpenCL 5
10 runlength Fig. 10 Execution Time of runlength [6] SMYLE OpenCL Vol. 2012-EMB-27, No. 7, pp. 1 8 (2012). [7] MIPS Geyser FPGA Linux Vol. 2010-ARC-189, No. 9, pp. 1 8 (2010). [8] FPGA SMYLEref Vol. 2012-ARC-198, No. 15, pp. 1 7 (2012). [9] SoC OpenCL DA 2012 pp. 73 78 (2012). 5. SMYLEref OpenCL SMYLEref FPGA 128 SMYLE OpenCL SMYLEref (NEDO) [1] Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A. and Purcell, T. J.: A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum, Vol. 26, No. 1, pp. 80 113 (2007). [2] NVIDIA Corporation: NVIDIA CUDA C Programming Guide, version 4.0, available from http://developer.download.nvidia.com/compute/cuda/ 4 0/toolkit/docs/CUDA C Programming Guide.pdf (2011). [3] Khronos OpenCL Working Group: The OpenCL Specification Version 1.1, available from http://www.khronos.org/registry/cl/specs/opencl- 1.1.pdf (2011). [4] Lindholm, E., Nickolls, J., Oberman, S. and Montrym, J.: NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, Vol. 28, pp. 39 55 (2008). [5] OpenCL Vol. 2012-SLDM-155, No. 2, pp. 1 6 (2012). 6