Cell/B.E. BlockLib

Size: px

Start display at page:

Download "Cell/B.E. BlockLib"

あかりゆきしげ
5 years ago
Views:

1 Cell/B.E. BlockLib

2 i Cell/B.E. BlockLib SIMD CELL SIMD Cell Cell BlockLib BlockLib NestStep libspe1 Cell SDK 3.1 libspe2 BlockLib Cell SDK 3.1 NestStep libspe2 BlockLib BlockLib libspe1 libspe2 BlockLib libspe2 NestStep BlockLib Cycric Distribute Array Block Distribute Array Cell/B.E.

3 Cell/B.E. BlockLib Cell/B.E NestStep BSP Cell-NestStep-C BlockLib libspe libspe1 libspe SPE libspe aligned

5 1 1 SIMD Cell Broadband Engine ( Cell/B.E.) [1] 1 SONY IBM 3 Cell/B.E. SIMD 1 PPE ( PowerPC Processor Element) 8 SPE (Synergistic Processor Element) 1 Cell/B.E. Cell/B.E. BlockLib BlockLib BlockLib Linköping University NestStep Cell/B.E. SPE libspe libspe libspe1 libspe2 libspe2 NestStep libspe1 IBM Cell SDK 3.1 BlockLib Cell SDK 3.1 NestStep libspe2 NestStep libspe1 libspe2 libspe2 NestStep BlockLib 2 NestStep BlockLib Cell/B.E.

6 2 3 NestStep 4 NestStep BlockLib 2 Cell/B.E. 5 2 BlockLib NestStep Cell/B.E. 2.1 Cell/B.E. Cell/B.E. SONY IBM 3 SIMD Cell/B.E. 1 PPE ( PowerPC Processor Element) 8 SPE (Synergistic Processor Element) GFLOPS. Cell/B.E. 1 EIB (Element Interconnect Bus) EIB 204.8GB/ EIB SPE 256KB ( LS) LS SPU Synergistic Processor Unit SPE SPU SPE LS Memory Flow Controller ( MFC) LS LS 2. SPE 128bit SIMD LS 2Way

7 3 1: Cell/B.E Cell/B.E. (1) Cell/B.E. Cell/B.E. 2 ( PPU SPU) ( PPE SPE ) (2) SPE DMA Cell/B.E. Cell/B.E. SPE Cell/B.E. OS SPE (3) SPE SPU SIMD

8 4 SPE SPE 2.2 NestStep NestStep BSP NestStep BSP BSP BSP Bulk Synchronous Parallel [2] 1990 Valiant Oxford 3 3 superstep 3 3 (1). (2) superstep. (3) superstep superstep superstep 2 BSP worker worker master BSP worker master master worker 2 BSP superstep worker BSP superstep p 2 4 superstep

9 5 2: BSP superstep BSP superstep L g w h superstep t(step) t(step) = w + hg + L superstep superstep 2 3

10 6 3: Cell-NestStep-C w h g L BSP t(prog) superstep t(step) t(prog) = t(step) step Cell-NestStep-C NestStep[3] BSP Christoph W. Kessler Java NestStep-Java 2000 C NestStep-C NestStep-C 2006 MPI Cluster-NestStep-C 2007 Cell-NestStep-C Cell/B.E. NestStep BSP superstep Cell-NestStep-C 3 Cell-NestStep-C C C++ NestStep Cell/B.E. Cell/B.E. NestStep BSP

11 7 4: Block Distributed Array superstep NestStep Java Java C NestStep run-time NestStep C. superstep NestStep block distributed array cyclic distributed array block distributed array block distributed array 0 3 P1 4 7 P2 cyclic distributed array block distributed array P P2 2 array block distributed array BlockLib block distributed array 2.3 BlockLib BlockLib [4] Cell/B.E. SIMD

12 8 5: Cyclic Distributed Array BlockLib map reduce BlockLib map reduce map C map reduce map-reduce map map reduce map i [0, N 1], r[i] = f(a 0 [i],, a k [i]) redece map reduce r = a[0] op a[1] op op a[n 1] map-reduce map reduce Google map-reduce 6 map-reduce f(a 0 [1],, a k [1]) op f(a 0 [2],, a k [2]) op op f(a 0 [N 1],, a k [N 1]) BlockLib NestStep superstep 1 1 map superstep

13 9 6: map-reduce libspe Cell/B.E. libspe (SPE Runtime Management Library) SPE. libspe libspe1 libspe2 libspe1 libspe2 PPE SPE. libspe Cell SDK 3.1 libspe 2 IBM libspe1 Cell SDK libspe2 Cell SDK 2.1 libspe1 libspe1 libspe1.2 libspe1 libspe 2 libspe 1 API 1 libspe 1.x (OS ) API libspe 2.1 ( ) base event API libspe2 base API libspe1 API API libspe 2 libspe1 API libspe1 libspe2.

14 10 NestStep Cell SDK 2.1 Libspe1. Cell SDK 3.1 libspe1 libspe2. Cell/B.E. Cell/B.E. ppu-gcc, spu-gcc ppu-gcc 32bit 64bit NestStep 3 NestStep libspe1 libspe2 NestStep libspe2 3.1 libspe1 libspe libspe1 SPE spe_create_thread(). libspe1 SPE. 1. spe_open_image() SPE 2. spe_create_thread() SPE ( SPE ) 3. spe_wait() SPE ( SPE ) 4. spe_close_image() SPE libspe1 SPE PPE (pthread) SPE API (spe_create_thread()) SPE spe_create_thread()

15 11 libspe2 SPE SPE spe_context_create() spe_program_load() spe_context_run() spe_program_load() LS SPE SPE SPE pthread PPE libspe2 SPE 1. spe_image_open() SPE 2. spe_context_create() SPE 3. spe_program_load() SPE SPE LS 4. spe_context_run() SPE 5. spe_context_destroy() SPE 6. spe_image_close() SPE SPE spe_context_run() API spe_context_run() SPE (stop )PPE libspe1 libspe2 7 NestStep libspe1 libspe2 SPE API SPE N SPE 7 libspe1 SPE spe_create_thread() libspe2 SPE PPE pthread pthread_join() libspe1 libspe2

16 12 7: 3.2 Cell/B.E. DMA 1. DMA 16Byte 16KByte 16Byte Byte DMA 2. DMA 16Byte DMA 16Byte 128Byte 16Byte DMA

17 13 8: DMA (a) Byte (b) LS 4 16Byte 8 4Byte DMA LS 4Byte LS 16Byte MFC DMA PPE SPE 2 NestStep check_dma() 16Byte Byte 16Byte

18 14 4 libspe NestStep 4.1 libspe libspe1 libspe2 IBM [5] ppu spu spe_open_image spe_close_image libspe2 SPU IBM 1 1 spe_create_thread

19 15 spe_open_image spe_close_image libspe1 ppu example libspe2 ppu example #include <libspe.h> spe_program_handle_t * <program_handle>; <program_handle> = spe_open_image("<filename>"); spe_close_image( <program_handle>); #include <libspe2.h> spe_program_handle_t * <program_handle>; <program_handle> = spe_image_open("<filename>"); spe_image_close( <program_handle>); 1 1 libspe1 libspe2 pthread ppu_pthread_function ppu_pthread_data_t

20 16 spe_create_thread libspe1 ppu example libspe2 ppu example 1/2 #include <libspe.h> spe_gid_t <group>; spe_program_handle_t <spe_program>; void *<argp>; void *<envp>; unsigned long <mask>; int <flags>; speid_t <speid>; <speid> = spe_create_thread(<group>, &<spe_program>, <argp>, <envp>, <mask>, <flags>); #include <libspe2.h> #include <pthread.h> typedef struct ppu_pthread_data { spe_context_ptr_t <speid>; pthread_t pthread; unsigned int entry; unsigned int <flags>; void *<argp>; void *<envp>; spe_stop_info_t stopinfo; } ppu_pthread_data_t; spe_program_handle_t <spe_program>; void *<argp>; void *<envp>; int <flags>; pthread_attr_t attr; ppu_pthread_data_t ppdata; spe_create_group pthread

21 17 libspe2 ppu example 2/2 void *ppu_pthread_function(void *arg) { ppu_pthread_data_t *datap = (ppu_pthread_data_t *)arg; int rc; do { rc = spe_context_run(datap-><speid>, &datap->entry, datap-><flags>, datap-><argp>, datap-><envp>, &datap->stopinfo); } while (rc > 0); pthread_exit(null); } ppdata.<speid> = spe_context_create(<flags>, NULL); spe_program_load(ppdata.<speid>, &<spe_program>); ppdata.entry = SPE_DEFAULT_ENTRY; ppdata.flags = <flags>; ppdata.argp = <argp>; ppdata.envp = <envp>; pthread_create(&ppdata.pthread, &attr, &ppu_pthread_function, &ppdata); ( spe_group_max )

22 18 spe_create_group libspe1 ppu example #include <libspe.h> spe_gid_t <group>; int <policy>; int <priority>; int <spe_event>; <group> = spe_create_group(<policy>, <priority>, <spe_event>); aligned attribute ((aligned(n))) aligned aligned PowerPC CPU x86 4byte Cell/B.E. 8byte ppu-gcc 4.0.x 64bit 4.1.x 32bit long

23 19 libspe2 ppu example #include <libspe2.h> #include <pthread.h> int <policy>; int <priority>; int <spe_event>; pthread_attr_t attr; struct sched_param param; spe_context_ptr_t <speid>; pthread_attr_init(&attr); pthread_attr_setschedpolicy(&attr, <policy>); param.sched_priority = <priority>; pthread_attr_setschedparam(&attr, &param); <speid> = spe_context_create( <spe_event>!= 0? SPE_EVENTS_ENABLE : 0, NULL); byte 4 16byte 16

24 20 4 malloc memalign pi = (double *) malloc(sizeof(double)*2); pi = (double *) memalign(16, sizeof(double)*2); 4 256KByte SPU LS 5 libspe1 NestStep libspe2 NestStep BlockLib BlockLib 5.1 Cell Challenge 2009 libspe1 libspe2 SCE PLAYSTATION3( PS3) 1 1: PLAYSTATION3 libspe1 libspe

25 21 libspe1 libspe PS3 Cell Reference Set ( CRS)[6] 1 CRS Cell/B.E. I/O CRS Cell/B.E. 7 SPE PS3 Cell/B.E. 6 SPE NestStep (PPU SPU ) ppu ppu-gcc( SPU spu-gcc SPU ppu-embedspu CRS ppu 32 -m32 CRS 7 SPE PS3 SPE 6 SPE 6 2: PS3 CRS OS FedoraCore9 Red Hat Linux 4.1 CPU Cell/B.E. 3.2GHz Cell/B.E. 3.2GHz Reference Set gcc gcc O2 O2 map reduce map reduce NestStep BlockLib original BlockLib libspe1 NestStep libspe1 BlockLib libspe2 NestStep libspe2

22 9: BlockLib libspe1 libspe2 SPE 6 6 SPE SPE1 128Byte 16KByte DMA 5.

26 22 9: BlockLib libspe1 libspe2 SPE 6 6 SPE SPE1 128Byte 16KByte DMA map reduce map reduce PS3 CRS 6 6 original(1 SPE) libspe1(1 SPE) libspe2(1 SPE),original(6 SPE) libspe1(6 SPE) libspe2(6 SPE) original(1 SPE)

27 BlockLib libspe1 libspe2 libspe1 libspe2 BlockLib original SIMD BlockLib original BlockLib BlockLib original CELL/B.E. BlockLib SPE 1 SPE 6 SPE 6 1 SPE1 SPE 6 Cell/B.E. SIMD BlockLib Cell SDK 3.1 Libspe2 NestStep BlockLib PLAYSTATION3 Cell Reference Set BlockLib CELL/B.E. BlockLib 2 1 NestStep libspe2 NestStep BlockLib

28 24 Libspe2 NestStep 2 BlockLib Cyclic Distributed Array BlockLib NestStep Block Distributed Array NestStep Cyclic Distributed Array Cyclic Distributed Array BlockLib Cyclic Distributed Array,,,,.,,. [1] Sony Computer Entertainment: Cell Broadband Engine Architecture, 1.01 edition (2006). [2] Valiant, L.: A bridging model for parallel computation, Communication of the ACM, Vol. 33, pp [3] Keβler, C. W.: NestStep: Nested Parallelism and Virtual Shared Memory for the BSP model (1999). [4] Alind, M.: A Skeleton Library for Cell Broadband Engine, Master theses, IDA Linkötopings universitet (2008). [5] : SPE Runtime Management Library Version 1 to Version 2 Migration guide, IBM (2007). [6],, : Cell,, Vol. 61, No. 6, pp (2006).

Logistello 1) playout playout 1 5) SIMD Bitboard playout playout Bitboard Bitboard 8 8 = black white 2 2 Bitboard 2 1 6) position rev i

Logistello 1) playout playout 1 5) SIMD Bitboard playout playout Bitboard Bitboard 8 8 = black white 2 2 Bitboard 2 1 6) position rev i SIMD 1 1 1 playout playout Cell B. E. SIMD SIMD playout playout Implementation of an Othello Program Based on Monte-Carlo Tree Search by Using a Multi-Core Processor and SIMD Instructions YUJI KUBOTA,