連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 問題の定義 αβ 法 16 2 αβ 法の並列化 概要 Young Brothers Wa

Similar documents
連載講座 : 高生産並列言語を使いこなす (3) ゲーム木探索問題 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 概要 17 2 ゲーム木探索 必勝 必敗 引き分け 盤面の評価値 αβ 法 指し手の順序付け (mo

連載講座 : 高生産並列言語を使いこなす (5) 分子動力学シミュレーション 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 問題の定義 17 2 逐次プログラム 分子 ( 粒子 ) セル 系の状態 ステップ 18

£Ã¥×¥í¥°¥é¥ß¥ó¥°ÆþÌç (2018) - Â裵²ó ¨¡ À©¸æ¹½Â¤¡§¾ò·ïʬ´ô ¨¡

£Ã¥×¥í¥°¥é¥ß¥ó¥°ÆþÌç (2018) - Â裶²ó ¨¡ À©¸æ¹½Â¤¡§·«¤êÊÖ¤· ¨¡

C

N08

Emacs ML let start ::= exp (1) exp ::= (2) fn id exp (3) ::= (4) (5) ::= id (6) const (7) (exp) (8) let val id = exp in

Microsoft PowerPoint - 03_What is OpenMP 4.0 other_Jan18

第101回 日本美容外科学会誌/nbgkp‐01(大扉)

27巻3号/FUJSYU03‐107(プログラム)

パーキンソン病治療ガイドライン2002

本文27/A(CD-ROM

tnbp59-20_Web:P1/ky108679509610002943

£Ã¥×¥í¥°¥é¥ß¥ó¥°ÆþÌç (2018) - Â裱£²²ó ¡Ý½ÉÂꣲ¤Î²òÀ⡤±é½¬£²¡Ý

( ) ( ) 30 ( ) 27 [1] p LIFO(last in first out, ) (push) (pup) 1

Run-Based Trieから構成される 決定木の枝刈り法

r07.dvi

ohp07.dvi

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

VXPRO R1400® ご提案資料

: (1), ( ) 1 1.1,, 1 OpenMP [3, 5, 21, 22], MPI [13, 18, 23].., (C Fortran)., OS,. C Fortran,,,,. ( ),,.,,.,,,.,,,.,.,. 1

£Ã¥×¥í¥°¥é¥ß¥ó¥°(2018) - Âè11²ó – ½ÉÂꣲ¤Î²òÀ⡤±é½¬£² –

XMPによる並列化実装2

Research on decision making in multi-player games with imperfect information

新版明解C言語 実践編


ohp11.dvi

r11.dvi

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co

2016.

第121回関東連合産科婦人科学会総会・学術集会 プログラム・抄録

XcalableMP入門

memo

XACCの概要

2015 3

2. OpenMP OpenMP OpenMP OpenMP #pragma#pragma omp #pragma omp parallel #pragma omp single #pragma omp master #pragma omp for #pragma omp critica

1

Ł\”ƒ-2005

漸化式のすべてのパターンを解説しましたー高校数学の達人・河見賢司のサイト

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

ex01.dvi

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

Microsoft PowerPoint - OpenMP入門.pptx

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

新・明解C言語 実践編

LIFO(last in first out, ) 1 FIFO(first in first out, ) 2 2 PUSH POP : 1

C による数値計算法入門 ( 第 2 版 ) 新装版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 新装版 1 刷発行時のものです.

gpw96.dvi

07-二村幸孝・出口大輔.indd

dプログラム_1

/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1

01_OpenMP_osx.indd

Microsoft Word - 計算科学演習第1回3.doc

Microsoft PowerPoint ppt [互換モード]

459

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

An Introduction to OSL

放射線専門医認定試験(2009・20回)/HOHS‐01(基礎一次)

I J

「産業上利用することができる発明」の審査の運用指針(案)

演習1: 演習準備

Prog1_6th

B

Cell/B.E. BlockLib

II 3 yacc (2) 2005 : Yacc 0 ~nakai/ipp2 1 C main main 1 NULL NULL for 2 (a) Yacc 2 (b) 2 3 y

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

II ( ) prog8-1.c s1542h017%./prog8-1 1 => 35 Hiroshi 2 => 23 Koji 3 => 67 Satoshi 4 => 87 Junko 5 => 64 Ichiro 6 => 89 Mari 7 => 73 D

26

Microsoft Word - C.....u.K...doc

ohp03.dvi

file"a" file"b" fp = fopen("a", "r"); while(fgets(line, BUFSIZ, fp)) {... fclose(fp); fp = fopen("b", "r"); while(fgets(line, BUFSIZ, fp)) {... fclose


SystemC言語概論

tuat1.dvi

lexex.dvi

スライド 1

untitled

1:. Csmith,, (B!=0? A/B : A),.,., Orange3 [3], Orange4 [4],., Csmith., Csmith GCC LLVM.,,., Orange3, Orange4,, if for., Orange4, C, Csmith.,., if, for

Microsoft PowerPoint ppt [互換モード]


-1-

第86回日本感染症学会総会学術集会後抄録(I)

ex01.dvi

hpc141_shirahata.pdf

[ 1] 1 Hello World!! 1 #include <s t d i o. h> 2 3 int main ( ) { 4 5 p r i n t f ( H e l l o World!! \ n ) ; 6 7 return 0 ; 8 } 1:

XACC講習会

IntelR Compilers Professional Editions

本文ALL.indd

memo

春期講座 ~ 極限 1 1, 1 2, 1 3, 1 4,, 1 n, n n {a n } n a n α {a n } α {a n } α lim n an = α n a n α α {a n } {a n } {a n } 1. a n = 2 n {a n } 2, 4, 8, 16,


研修コーナー

DPD Software Development Products Overview

IPSJ SIG Technical Report 1 1, Nested Transactional Memory Selecting the Optimal Rollback Point Yuji Ito, 1 Ryota Shioya, 1, 2 Masahiro Goshima

tnbp59-21_Web:P2/ky132379509610002944

main

r08.dvi

パーキンソン病治療ガイドライン2002

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

日本内科学会雑誌第97巻第7号

Transcription:

連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 16 1.1 問題の定義 16 1.2 αβ 法 16 2 αβ 法の並列化 17 2.1 概要 17 2.2 Young Brothers Wait Concept 17 2.3 段数による逐次化 18 2.4 適応的な待機 18 2. 強制終了 19 3 Cilk による並列化 19 3.1 基本 19 3.2 YBWC 3.3 深さによる逐次化 21 3.4 適応的な待機 21 3. 強制終了 22 4 OpenMP による並列化 23 4.1 基本 23 4.2 YBWC 24 4.3 適応的な待機 2 4.4 強制終了 27 Intel TBB による並列化 28.1 基本 28.2 YBWC 29.3 適応的な待機 3 6 実験 31 6.1 設定 31 6.2 訪問ノード数の増大 32 6.3 逐次性能 34 6.4 台数効果 ( スループット ) 34 6. 台数効果 ( 実行時間 ) 36 7 まとめ 36

1 1.1,., g E(g). - (g ) E(g) = max ( E(h)) (g ) h:g h g h g h. g h., g,., g,. g,,. 1.2 αβ αβ C. 1: int eval(game_state_t g, int alpha, int beta) { 2: int moves[maxempties]; 3: int n_moves = gen_moves(g, moves); 4: if (n_moves == ) { : return g->disc_diff; 6: else { 7: int i; 8: for (i = ; i < n_moves; i++) { 9: int e = -move_and_eval(*g, i, moves[i], -beta, -alpha); : if (e > alpha) { 11: alpha = e; 12: if (e >= beta) break; 13: 14: 1: return alpha; 16: 17: g, [α, β]., α < E(g) < β E(g), E(g) α, α, E(g) β, β. β, 12. αβ,.

2 αβ 2.1 αβ,., 8-13. αβ,.,, β, (12 ), (, α) (9 ),, 2. αβ,.,. β, ( ),, [ β, α],, α., αβ (g ),,. αβ,.,., Cilk [1, 6, ]. 2.2 Young Brothers Wait Concept, Young Brothers Wait Concept (YBWC) [2]., g h, h 1, h 2,..., h, h 1, h 2,... ( ).,, h.. (move ordering), h, i( E( h ) E( h i )), 11 α., h 1, h 2,...., h β, h 1, h 2,... β.,,..,.

YBWC,?, l d,. ( ) Θ(d l/2 ).,, Θ(dl)., l. d,, Θ(l log d). YBWC, h, Θ(d2 l/2 ). w, Θ(d(w + 1) l/2 )., w,., w 1,., Θ(d l/2 ), d > w + 1., YBWC (w = 1 ), w = 2. 2.3,,...,. Cilk, TBB, OpenMP,,.,, β,,,. αβ.,,,. 2.4 YBWC, w, w,. w, w, β,., β

,.,,., h 1, h 2,...,,,,.,.,.,,,, (, ).,, β.,. 3-. 2., g h, β,, g,.,, Cilk, OpenMP, TBB. 3 Cilk 3.1 Cilk, spawn, sync, spawn.,. int eval(game_state_t g, int alpha, int beta) { int moves[maxempties]; int child_results[maxempties]; int n_moves = gen_moves(g, moves); if (n_moves == ) { return g->disc_diff; else { int i; for (i = ; i < n_moves; i++) { child_results[i] = spawn move_and_eval(*g, i, moves[i], -beta, -alpha); sync; for (i = ; i < n_moves; i++) { int e = -child_results[i];

if (e > alpha) { alpha = e; if (e >= beta) break; return alpha; 3.2 YBWC YBWC,, sync., w w. 1: int eval(game_state_t g, int alpha, int beta, int w) { 2: int moves[maxempties]; 3: int child_results[maxempties]; 4: int n_moves = gen_moves(g, moves); : if (n_moves == ) { 6: return g->disc_diff; 7: else { 8: int i; 9: for (i = ; i < n_moves; i++) child_results[i] = INF; : for (i = ; i < n_moves; i++) { 11: child_results[i] = spawn move_and_eval(*g, i, moves[i], -beta, -alpha); 12: if (i < w) { /* YBWC : w */ 13: sync; 14: int e = -child_results[i]; 1: if (e > alpha) { 16: alpha = e; 17: if (e >= beta) break; 18: 19: : 21: sync; 22: for (i = ; i < n_moves; i++) { 23: int e = -child_results[i]; 24: if (e > alpha) { 2: alpha = e; 26: if (e >= beta) break; 27: 28: 29: return alpha; 3:

31: 3.3. eval,,,.,. 3.4 2.4 Cilk., spawn,., : for (i = ; i < n_moves; i++) { 11: child_results[i] = spawn move_and_eval(*g, i, moves[i], -beta, -alpha); 12:..., i = (x + 1), i x., i x i (x + 1), i x β. Cilk,., SYNCHED, spawn., sync. Cilk,, 11 12: if (i < w) { /* YBWC : w */ 13: sync; 14:..., 12 : if (i < w SYNCHED) { 13: sync; 14:...!, SYNCHED,. Cilk [3],. Cilk, spawn, spawn., work-first. spawn,, spawn ( ), (work stealing).,, work stealing. SYNCHED 1. SYNCHED, g,

h spawn h, g (h ) SYNCHED.,,. 1.. 3., g,, β,, g. Cilk, spawn, spawn. Cilk spawn sync. spawn,. 3.4 SYNCHED, spawn,. Cilk, inlet., spawn,. inlet C., Cilk. spawn inlet, spawn., spawn f(...),, handler. inlet handler(int val, int x, int y) {... handler(spawn f(..), x, y);, inlet abort, g inlet, g., Cilk., sync inlet,,. /* g */ int eval(game_state_t g, int alpha, int beta, int w) { int moves[maxempties]; int n_moves = gen_moves(g, moves); if (n_moves == ) { return g->disc_diff; else { int i; inlet handler(int child_result, int j) { int e = -child_result;

if (e > alpha) { alpha = e; if (e >= beta) abort; for (i = ; i < n_moves; i++) { handler(spawn move_and_eval(*g, i, moves[i], -beta, -alpha, w), i); if (i < w SYNCHED) sync; if (alpha >= beta) break; return alpha; 4 OpenMP 4.1 OpenMP 3.. Cilk,. Cilk spawn, #pragma omp task shared(...) if(...). C,. Cilk,. shared,.,. if,,. Cilk sync, #pragma omp taskwait. Cilk spawn/sync. OpenMP, task, parallel., parallel,,., parallel, master,. #pragma omp parallel #pragma omp master, task/taskwait.

4.2 YBWC, Cilk spawn task, sync taskwait. 1: int eval(game_state_t g, int alpha, int beta, int w) { 2: int moves[maxempties]; 3: int child_results[maxempties]; 4: int n_moves = gen_moves(g, moves); : if (n_moves == ) { 6: return g->disc_diff; 7: else { 8: int i; 9: for (i = ; i < n_moves; i++) child_results[i] = INF; : for (i = ; i < n_moves; i++) { 11: #pragma omp task shared(child_results) 12: child_results[i] = move_and_eval(*g, i, moves[i], -beta, -alpha, w); 13: if (i < w) { /* YBWC : w */ 14: #pragma omp taskwait 1: int e = -child_results[i]; 16: if (e > alpha) { 17: alpha = e; 18: if (e >= beta) break; 19: : 21: 22: #pragma omp taskwait 23: for (i = ; i < n_moves; i++) { 24: int e = -child_results[i]; 2: if (e > alpha) { 26: alpha = e; 27: if (e >= beta) break; 28: 29: 3: return alpha; 31: 32: 33: 34: int main() { 3: #pragma omp parallel 36: #pragma omp master 37: { 38:... 39: eval(...); 4:...

41: 42: 4.3 OpenMP, Cilk SYNCHED. : for (i = ; i < n_moves; i++) { 11: #pragma omp task shared(child_results) 12: child_results[i] = move_and_eval(*g, i, moves[i], -beta, -alpha, w); 13: if (i < w) { /* YBWC : w */ 14: #pragma omp taskwait 1:..., if (i < w).,,,.,., : for (i = ; i < n_moves; i++) { : child_results[i] = INVALID; 11: #pragma omp task shared(child_results) 12: child_results[i] = move_and_eval(*g, i, moves[i], -beta, -alpha, w); 13 : if (i < w child_results[i]!= INVALID) { 14: #pragma omp taskwait 1:.... INVALID, move and eval ( ). ( ), GCC OpenMP (GOMP[4]),., GOMP Cilk work-first. GOMP, task,,., (parent-first ). taskwait,,.,,.,, work stealing Cilk,.,,, taskwait., h 1, h 2 (taskwait ), h 2, h 1

., h 1, h 2,.,., g h, g ( h ),.,, alpha.,, alpha... alpha. 1: for (i = ; i < n_moves; i++) { 2: #pragma omp task shared(alpha) if(i>=w) 3: if (alpha < beta) { 4: int e = -move_and_eval(g, i, moves[i], -beta, -alpha, w); : omp_set_lock(&lock); 6: if (e > alpha) alpha = e; 7: omp_unset_lock(&lock); 8: 9: if (alpha >= beta) break; : 11: #pragma omp taskwait 12: return alpha; 3 if., 3, 4 alpha,., -7 alpha,,,., work stealing, parent-first. 1. 3 8 if, i < w,. 2. 11 taskwait,. 3., alpha. 4., 3 4 alpha,..,,,,,., for (i = ; i < n; i++) { #pragma omp task E(h i );

#pragma omp taskwait,, h n 1, h n 2,..., h..,,,., GOMP,. for i = x, moves,. 4.4 OpenMP task, Cilk abort.,,,., 1., 2. g, h, g (β ), h, g 1 3., 1,., typedef struct abort_signal { int aborting; struct abort_signal * parent; abort_signal, * abort_signal_t;., abort signal,.. 1: /* g */ 2: int eval(game_state_t g, int alpha, int beta, abort_signal_t as, int w) { 3: if (check_abort(as)) return -INF; 4: int moves[maxempties]; : int n_moves = gen_moves(g, moves); 6: if (n_moves == ) { 7: return g->disc_diff; 8: else { 9: int i; : abort_signal child_as[maxempties]; 11: for (i = ; i < n_moves; i++) { 12: child_as[i].parent = as; 13: child_as[i].aborting = ;

14: #pragma omp task shared(alpha, as, child_as) 1: if (alpha < beta) { 16: int e = -move_and_eval(g, i, moves[i], -beta, -alpha_, &child_as[i], w); 17: omp_set_lock(&lock); 18: if (e > alpha) alpha = e; 19: if (e >= beta) as[i].aborting = 1; : omp_unset_lock(&lock); 21: 22: if (alpha >= beta) break; 23: 24: #pragma omp taskwait 2: return alpha; 26: 27: check abort,. int check_abort(abort_signal_t as) { abort_signal_t p; for (p = as; p; p = p->parent) if (p->aborting) return 1; return ;,,.,,.,., 3.3,. Intel TBB.1 Intel TBB,, task, execute., spawn()., Cilk spawn/sync, OpenMP task/taskwait, task group. task group, run() wait(),., run(), operator(), task handle, C++ C++x

,,. GCC, -std=c++x., OpenMP task.. 1: #include <tbb/task_group.h> 2: using namespace tbb; 3: 4: int fib(int n) { : if (n < 2) return 1; 6: else { 7: task_group tg; 8: int x, y; 9: tg.run([=,&x] { x = fib(n - 1); ); : y = fib(n - 2); 11: tg.wait(); 12: return x + y; 13: 14: 1: 16: int main(int argc, char ** argv) { 17: int n = atoi(argv[1]); 18: int x= fib(n); 19: printf("fib(%d) = %d\n", n, x); : GCC ( <tbb dir> TBB ). g++ -std=c++x -I<tbb_dir>/include fib.cc -L<tbb_dir>/lib -ltbb, 9: tg.run([=,&x] { x = fib(n - 1); );, [=,&x] { x = fib(n - 1); C++x ( ). [=,&x], &x, x =. OpenMP task, shared(x)..2 YBWC YBWC, OpenMP task/taskwait, task group run/wait.

1: int eval(game_state_t g, int alpha, int beta, int w) { 2: int moves[maxempties]; 3: int child_results[maxempties]; 4: int n_moves = gen_moves(g, moves); : if (n_moves == ) { 6: return g->disc_diff; 7: else { 8: int ii; 9: task_group tg; : for (ii = ; ii < n_moves; ii++) child_results[ii] = INF; 11: for (ii = ; ii < n_moves; ii++) { 12: int i = (ii < w? ii : n_moves - ii + w - 1); 13: tg.run([=,&child_results] 14: { child_results[i] = move_and_eval(*g, i, moves[i], -beta, -alpha, w); ); 1: if (i < w) { /* YBWC : w */ 16: tg.wait(); 17: int e = -child_results[i]; 18: if (e > alpha) { 19: alpha = e; : if (e >= beta) break; 21: 22: 23: 24: /* */ 2: tg.wait(); 26: for (ii = ; ii < n_moves; ii++) { 27: int e = -child_results[ii]; 28: if (e > alpha) { 29: alpha = e; 3: if (e >= beta) break; 31: 32: 33: return alpha; 34: 3: TBB, i >= w,.,, 11: int i = (ii < w? ii : n_moves - ii + w - 1);.

4 41 42 43 44 47 1: (http://radagast.se/othello/ffotest.html )...3 TBB, OpenMP. Cilk SYNCHED, GOMP, parent-first. OpenMP,,. OpenMP,. task_group tg; spin_mutex lock_; for (ii = ; ii < n_moves; ii++) { int i = (ii < w? ii : n_moves - ii + w - 1); tg.run([=,&alpha,&lock_] { if (alpha < beta) { int e = -move_and_eval(g, i, moves[i], -beta, -alpha, w); spin_mutex::scoped_lock lock(lock_); if (e > alpha) alpha = e; ); if (i < w) tg.wait(); if (alpha >= beta) break; tg.wait(); return alpha;, OpenMP.. 6 6.1,, Andersson The FFO endgame test suite (http://radagast.se/othello/ffotest.html)., 6,,,.

4 3 3 2 1 Legend adaptive w=1 adaptive w=2 abort w=1 abort w=2 1 2 3 3 4 4 2: 3-. adaptive 2.4, abort 2.. w, 2.2,.,, 4, 47 ( 6, 2 ). 24 (48 ). CPU: Intel Nehalem-EX (E74) 2.GHz (6 core/12 4 ) L3 cache: 18MB memory: 24GB DDR3 6.2,., ( ). ( 6 ) 4 3 41 22 379 42 22 328 43 23 341 44 23 347 47 2 1266 3, Cilk, OpenMP, TBB,,,,., 12 (w) 1 2

Legend Visited nodes blow up of Cilk (Problem 41) 4 3 3 2 1 adaptive w=1 adaptive w=2 abort w=1 abort w=2 1 2 3 3 4 4 visited nodes 14 12 8 6 4 2 1 2 3 3 4 4 Visited nodes blow up of OpenMP (Problem 41) Visited nodes blow up of Intel TBB (Problem 41) visited nodes 9 8 7 6 4 3 2 1 1 2 3 3 4 4 visited nodes 3 2 1 1 2 3 3 4 4 Visited nodes blow up of Cilk (Problem 47) Visited nodes blow up of OpenMP (Problem 47) visited nodes 4 4 3 3 2 1 1 2 3 3 4 4 visited nodes 3 2 1 1 2 3 3 4 4 visited nodes Visited nodes blow up of Intel TBB (Problem 47) 6 4 3 1 2 3 3 4 4 3: ( 41,47)

, 2 2 = 4..,,., 4 2 -.,,., ( YBWC w = 2 ),, ( ),.. 6.3, 41,, w ( ), 1. 12,. C (Andersson), 2 C,, C. C (Andersson) % - %, C %,. w C (Andersson) - - 89.8 1. C - - 136.6 1.2 Cilk abort 1 138.1 1.4 Cilk abort 2 137. 1.3 Cilk adaptive 1 211.3 2.3 Cilk adaptive 2 14.8 1.7 OpenMP abort 1 166. 1.8 OpenMP abort 2 146.3 1.63 OpenMP adaptive 1 133. 1.49 OpenMP adaptive 2 138.8 1.46 Intel TBB abort 1 163. 1.82 Intel TBB abort 2 142. 1.9 Intel TBB adaptive 1 13. 1.4 Intel TBB adaptive 2 133.3 1.48 6.4 ( ) 6.2,.,. 4. 4 2-3,,., w,.

Legend Throughput increase of Cilk (Problem 41) 4 3 3 2 1 adaptive w=1 adaptive w=2 abort w=1 abort w=2 1 2 3 3 4 4 throughput up 3 2 1 1 2 3 3 4 4 Throughput increase of OpenMP (Problem 41) Throughput increase of Intel TBB (Problem 41) throughput up 2 1 1 2 3 3 4 4 throughput up 3 2 1 1 2 3 3 4 4 Throughput increase of Cilk (Problem 47) Throughput increase of OpenMP (Problem 47) throughput up 3 3 2 1 1 2 3 3 4 4 throughput up 3 3 2 1 1 2 3 3 4 4 Throughput increase of Intel TBB (Problem 47) throughput up 2 1 1 2 3 3 4 4 4: ( 41, 47)

6. ( ), Andersson,., 3, 6.3, 4.,,, w = 2., 4-6. 7, αβ. YBWC:, : :,.,,,,., work-first parent-first,,,,. Cilk OpenMP Intel TBB YBWC spawn, sync task, taskwait task group, run, wait work-first parent-first (GCC) parent-first SYNHCED, shared [&x] work-first inlet, abort,. [1] Don Dailey and Charles E. Leiserson. Using cilk to write multiprocessor chess programs. In Advances in Computer Games 9, 1. [2] Rainer Feldmann. Game Tree Search on Massively Parallel Systems. PhD thesis, University of Paderborn, 1993. [3] Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the cilk- multithreaded language. In ACM SIGPLAN Conference on Programming Language, pages 212 223, 1998.

Legend Absolute speedup of Cilk (Problem 41) 4 3 3 2 1 adaptive w=1 adaptive w=2 abort w=1 abort w=2 1 2 3 3 4 4 Speedup to Andersson 6 4 3 2 1 1 2 3 3 4 4 Absolute speedup of OpenMP (Problem 41) Absolute speedup of Intel TBB (Problem 41) Speedup to Andersson 4. 3. 4 2. 3 1. 2. 1 1 2 3 3 4 4 Speedup to Andersson 1.8 2 1.6 1.4 1.2.8 1.6.4.2 1 2 3 3 4 4 Absolute speedup of Cilk (Problem 47) Absolute speedup of OpenMP (Problem 47) Speedup to Andersson 6 4 3 2 1 1 2 3 3 4 4 Speedup to Andersson 4. 3. 4 2. 3 1. 2. 1 1 2 3 3 4 4 Absolute speedup of Intel TBB (Problem 47) Speedup to Andersson 3 2. 2 1. 1. 1 2 3 3 4 4 : Andersson ( 41, 47)

[4] GOMP an OpenMP implementation for GCC. http://gcc.gnu.org/projects/gomp/. [] Christopher F. Joerg and Bradley C. Kuszmaul. Massively parallel chess. In the Third DIMACS Parallel Implementation Challenge, 1994. [6] Bradley C. Kuszmaul. The startech massively parallel chess program. Journal of the International Computer Chess Association, 18(1), 199.