インテル® Parallel Studio 入門ガイド

Similar documents
インテル(R) C++ Composer XE 2011 Windows版 入門ガイド

インテル® Parallel Studio XE 2013 入門ガイド

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

hotspot の特定と最適化

インテル(R) Visual Fortran Composer XE

Microsoft Word - w_mkl_build_howto.doc

01_OpenMP_osx.indd

Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for

2. 目次 1. 目次 はじめに マルチスレッド化の最適化チュートリアル... 5 Visual Studio にサンプルプログラムのプロジェクトをロードする... 6 インテル コンパイラーを使用してソースコードをビルドする... 7 Visual Studio か

v10 IA-32 64¹ IA-64²

Nios II 簡易チュートリアル

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

r07.dvi

ohp07.dvi

Intel® Compilers Professional Editions

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co

ProVAL Recent Projects, ProVAL Online 3 Recent Projects ProVAL Online Show Online Content on the Start Page Page 13

WinDriver PCI Quick Start Guide

Quickstart Guide 3rd Edition

連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 問題の定義 αβ 法 16 2 αβ 法の並列化 概要 Young Brothers Wa

FFTSS Library Version 3.0 User's Guide

Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool

Nios® II HAL API を使用したソフトウェア・サンプル集 「Modular Scatter-Gather DMA Core」

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

untitled

DPD Software Development Products Overview

1 C STL(1) C C C libc C C C++ STL(Standard Template Library ) libc libc C++ C STL libc STL iostream Algorithm libc STL string vector l

C

programmingII2019-v01

FileMaker Server 9 Getting Started Guide

Nios II ハードウェア・チュートリアル

1 138

I I / 47

スパコンに通じる並列プログラミングの基礎

GPGPU

FileMaker Server Getting Started Guide

スパコンに通じる並列プログラミングの基礎

RX600 & RX200シリーズ アプリケーションノート RX用仮想EEPROM

Web Microsoft 2008 R2 Database Database!! Database 04 08

/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1

2. OpenMP OpenMP OpenMP OpenMP #pragma#pragma omp #pragma omp parallel #pragma omp single #pragma omp master #pragma omp for #pragma omp critica

ストリーミング SIMD 拡張命令2 (SSE2) を使用した、倍精度浮動小数点ベクトルの最大/最小要素とそのインデックスの検出

3.1 stdio.h iostream List.2 using namespace std C printf ( ) %d %f %s %d C++ cout cout List.2 Hello World! cout << float a = 1.2f; int b = 3; cout <<

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

Cleaner XL 1.5 クイックインストールガイド

strtok-count.eps

tutorial_lc.dvi

WinDriver を使用して、簡単にデバイス ドライバを作成

Microsoft Word - Meta70_Preferences.doc

THE PARALLEL Issue UNIVERSE James Reinders Parallel Building Blocks: David Sekowski Parallel Studio XE Cluster Studio Sanjay Goil John McHug

連載講座 : 高生産並列言語を使いこなす (5) 分子動力学シミュレーション 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 問題の定義 17 2 逐次プログラム 分子 ( 粒子 ) セル 系の状態 ステップ 18

1 I EViews View Proc Freeze

インテル® VTune™ Amplifier : Windows 環境向けスタートガイド

C言語によるアルゴリズムとデータ構造

(STL) STL 1 (deta structure) (algorithm) (deta structure) 2 STL STL (Standard Template Library) 2.1 STL STL ( ) vector<int> x; for(int i = 0; i < 10;

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

WinHPC ppt

ohp03.dvi

Introduction Purpose This course explains how to use Mapview, a utility program for the Highperformance Embedded Workshop (HEW) development environmen

HA8000シリーズ ユーザーズガイド ~BIOS編~ HA8000/RS110/TS10 2013年6月~モデル

Intel_ParallelStudioXE2013_ClusterStudioXE2013_Introduction.pptx

写真集計くん+ for Mac ユーザーズガイド

新版明解C言語 実践編

TOOLS for UR44 Release Notes for Windows

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

For_Beginners_CAPL.indd

SystemC言語概論

製品価格 ( 新規購入 ) INT6531 インテル VTune Amplifier XE 2017 for Windows Floating 1-275, ,000 INT6532 インテル VTune Amplifier XE 2017 for Linux Floating 1-27

1.3 ( ) ( ) C

Microsoft Word - MetaFluor70取扱説明.doc


MINI2440マニュアル

TOPLON PRIO操作手順

r03.dvi

07-二村幸孝・出口大輔.indd

インテル® Visual Fortran コンパイラー 11.0 Windows* 版プロフェッショナル・エディション

indd

2009 T

Transcription:

Parallel Studio エクセルソフト株式会社 www.xlsoft.com Rev. 1.1 (2010/04/08) 1 / 48

... 3 Parallel Studio... 3... 4... 5... 6 Parallel Composer... 8 Parallel Amplifier... 12 Parallel Composer... 16 Parallel Composer... 19 Parallel Inspector... 23 Parallel Composer... 27 Parallel Amplifier... 30... 36 Parallel Composer... 36... 36 Parallel Inspector... 40... 40... 41 Parallel Amplifier... 42...42... 43... 46... 46 N-Queens... 47 OpenMP... 47 TBB... 47... 48 2 / 48

Parallel Studio Parallel Studio Parallel Studio Parallel Studio Parallel Studio Microsoft* Visual Studio* C/C++ 3 Parallel Composer Parallel Composer C++ IA-32 Intel 64 ( IPP) ( TBB) Parallel Debugger Extension OpenMP Parallel Inspector Win 32 API OpenMP TBB Parallel Amplifier 3 / 48

Parallel Studio Visual Studio Parallel Amplifier Parallel Inspector Parallel Debugger Extension Parallel Composer C++ Parallel Composer Parallel Studio 3 Parallel Composer Parallel Amplifier Parallel Inspector Intel(R) Core(TM)2 Quad CPU Q6600 2.4GHz OS Microsoft Windows Vista* Business X86 IDE Microsoft Visual Studio 2008 Team System VS2008 4 / 48

Parallel Amplifier CPU Parallel Composer Parallel Composer Parallel Debugger Extension Parallel Inspector Parallel Amplifier Parallel Amplifier Parallel Composer Parallel Inspector Parallel Amplifier 5 / 48

Parallel Composer N-Queens N-Queens Zip C: Program Files Intel Parallel Studio Composer Samples en_us C++ NQueens.zip Zip nq-serial NQueens nq-serial nq-serial.cpp N-Queens Queen Queen Queen Queen N-Queens 1850 1969 Parallel Composer N-Queens N-Queens nq-serial nq-serial.cpp main() N N main() N-Queens solve() timegettime() solve() setqueen() N i Queen 0 i setqueen() setqueen() Queen setqueen() nrofsolutions 6 / 48

int main(int argc, char*argv[]) { if(argc!=2) { cerr << "Usage: nq-serial boardsize [default is 8]. n"; size = 8; else { size = atoi(argv[1]); // N cout << "Starting serial recursive solver for size " << size << "... n"; DWORD starttime=timegettime(); solve(new int[size]); // N-Queens DWORD endtime=timegettime(); cout << "Number of solutions: " << nrofsolutions << endl; cout << "Calculations took " << endtime-starttime << "ms. n"; return 0; void solve(int queens[]) { for(int i=0; i<size; i++) { // try all positions in first row // create separate array for each recursion setqueen(queens, 0, i); // N void setqueen(int queens[], int row, int col) { for(int i=0; i<row; i++) { // vertical attacks if (queens[i]==col) { return; // diagonal attacks if (abs(queens[i]-col) == (row-i) ) { return; // column is ok, set the queen queens[row]=col; if(row==size-1) { nrofsolutions++; // N-Queens else { // try to fill next row for(int i=0; i<size; i++) { setqueen(queens, row+1, i); // 7 / 48

Parallel Composer VS2008 Parallel Composer C++ 1. Windows [ ] VS2008 Microsoft Windows Vista [ ] Note [ ] [Intel Parallel Studio] [Intel Parallel Studio with VS 2008] 2. Win32 nq-parallelize 8 / 48

3. nq-serial.cpp 4. Debug Release VS2008 VC++ timegettime() winmm.lib 5. 9 / 48

6. 13 13 7. VS2008 [ ] - [ ] 8. Parallel Composer C++ C++ VC++ C++ VC++ 10 / 48

9. C++ 10. C++ 11. VS2008 [ ] - [ ] VC++ C++ 11 / 48

Parallel Amplifier Parallel Amplifier Parallel Amplifier nq-parallelize.exe Parallel Amplifier Parallel Amplifier [ Hotspot Where is my program spending time? ] [Profile] [Profile] Parallel Amplifier 12 / 48

Hotspots Call Stack Summary Summary Elapsed Time CPU Time CPU CPU CPU I/O Unused CPU Time CPU Intel(R) Core(TM)2 Quad CPU Core Count Threads Created Call Stack Hotspot Hotspot 9 13 / 48

Note Call Stack main solve IPO Hotspots Bottom-up Top-down Tree Bottom-up Hotsport Top-down Tree Hotsport 14 / 48

Hotspots Hotspot setqueen setqueen CPU 107 CPU 89 for CPU setqueen setqueen solve Note Visual Studio 15 / 48

Parallel Composer Parallel Studio 3 Win 32 API OpenMP Threading Building Blocks TBB Win 32 API CreateThread _beginthread Win 32 API OpenMP OpenMP #pragma Win 32 API OpenMP OpenMP #pragma #pragma OS OpenMP Fork-Join TBB C++ TBB STL TBB TBB OpenMP C++ 16 / 48

solve OpenMP OpenMP OpenMP #pragma omp parallel { printf printf OS for #pragma omp for For( i=0; i<n; i++ ) // 0 N { void solve(int queens[]) { #pragma omp parallel for for(int i=0; i<size; i++) { // try all positions in first row // create separate array for each recursion setqueen(queens, 0, i); size setqueen size 13 i=0 12 4 Intel Core2 Quad Core 1 i= 0 3 2 4 6 3 7 9 4 10 12 4 setqueen 17 / 48

solve( int queens[]) #pragma omp parallel for Fork setqueen(queens,0,0) setqueen(queens,0,1) setqueen(queens,0,2) setqueen(queens,0,3) setqueen(queens,0,4) setqueen(queens,0,5) setqueen(queens,0,6) setqueen(queens,0,7) setqueen(queens,0,8) setqueen(queens,0,9) setqueen(queens,0,10) setqueen(queens,0,11) setqueen(queens,0,12) Join #pragma omp parallel for for OpenMP #pragma omp C++ /Qopenmp nq-parallelize OpenMP N-Queens 13 73712 OpenMP 18 / 48

Parallel Composer Parallel Composer Parallel Debugger Extension Parallel Debugger Extension Microsoft Visual C++ OpenMP OpenMP SSE Parallel Debugger Extension C++ /Qopenmp /debug=parallel Parallel Debugger Extension Debug [ ] [C/C++] [Language] [OpenMP Support] /Qopenmp [ ] [C/C++] [Debug] [Enable Parallel Debug Checks] Yes (/debug:parallel) Release winmm.lib 13 Parallel Debugger Extension VS2008 [ ] - [Intel Parallel Debugger Extension] [Thread Data Sharing Detection] [Enable Detection] Parallel Debugger Extension ON OFF 19 / 48

[ Thread Data Sharing Events ] ON OFF [ Thread Data Sharing Filters ] VS2008 [ ] - [Intel Parallel Debugger Extension] [Windows] [Thread Data Sharing Events] Thread Data Sharing Events VS2008 [ ] - [ ] Parallel Debugger Extension Thread Data Sharing Events 0x003d6138 4700 7140 7860 read/write 0x003d6138 4 3 3 "nq-serial.cpp 95 0x003d6138 3 4700 7140 7860 82 void setqueen(int queens[], int row, int col) {... 93 94 // column is ok, set the queen 95 queens[row]=col; // 96. 20 / 48

queens [F5] queens Parallel Debugger Extension Thread Data Sharing Events queens VS2008 [ ] - [Intel Parallel Debugger Extension] [Windows] [Thread Data Sharing Filters] Thread Data Sharing Filters 0x003d6138 4 queens int 4 main() size = 13 52 Modify Data Range Filter [Byte Count] 4 52 [OK] 21 / 48

[F5] nrofsolutions 98 nrofsolutions++ nrofsolutions queens nrofsolutions Parallel Inspector Parallel Debugger Extension VS Parallel Inspector 22 / 48

Parallel Inspector Parallel Inspector Parallel Debugger Extension /debug:parallel [ ] [ ] [ ] 13 8 Parallel Inspector VS2008 [ ] - [Intel Parallel Inspector] [Inspect Threading Errors] Threading errors [Inspect] Configure Analysis Where are all the threading problems Inspector can find? 23 / 48

[Run Analysis] Event Log [Interpret Result] Overview Overview Problem Sets Observations in Problem Set 2 Problem Sets Observations in Problem Set Problem Sets Data race 2 P1 Observations in Problem Set ID 24 / 48

X7 X8 nq-serial.cpp 95 setqueen() Write Observations in Problem Set P1 queens Observations in Problem Set ID X7 Sources 25 / 48

X7 Focused observation Related observation Sources Related observation Observations in Problem Set Set as Focus Observation Focused observation Parallel Inspector 2 P1 queens P2 nrofsolutions 2 Note Parallel Inspector Parallel Debugger Extension Parallel Inspector OpenMP Win 32 API TBB Parallel Debugger Extension OpenMP Parallel Inspector Parallel Debugger Extension 26 / 48

Parallel Composer queens nrofsolutions OpenMP OpenMP queens nrofsolutions int nrofsolutions=0; nrofsolutions OpenMP int main() { solve( new int[size] ); solve( int queens[]) { #pragma omp parallel for for(int i=0; i<size; i++) { setqueen(queens, 0, i); A void setqueen(int queens[], int row, int col) { // column is ok, set the queen queens[row]=col; queens[] main OpenMP B void setqueen(int queens[], int row, int col) { // column is ok, set the queen queens[row]=col; if(row==size-1) { nrofsolutions++; if(row==size-1) { nrofsolutions++; queens[] setqueen() Queen setqueen() ID setqueen() queens[] solve() for setqueen queens[] nrofsolutions setqueen() Queen nrofsolutions++ 27 / 48

OpenMP critical nrofsolutions++ int main(int argc, char*argv[]) { cout << "Starting serial recursive solver for size " << size << "... n"; DWORD starttime=timegettime(); // solve(new int[size]); solve(); main() queens[] DWORD endtime=timegettime(); //void solve(int queens[]) { void solve(void) { #pragma omp parallel for for(int i=0; i<size; i++) { // try all positions in first row // create separate array for each recursion // setqueen(queens, 0, i); setqueen(new int[size], 0, i); void setqueen(int queens[], int row, int col) { // column is ok, set the queen queens[row]=col; setqueen() queens[] if(row==size-1) { #pragma omp critical nrofsolutions++; else { // try to fill next row for(int i=0; i<size; i++) { setqueen(queens, row+1, i); nrofsolutions 28 / 48

Debug Parallel Inspector Release [ ] [C/C++] [Debug] - [Enable Parallel Debug Checks] No [ ] [C/C++] [Language] - [OpenMP Support] Generate Parallel Code (/openmp equiv. to /Qopenmp) [ ] [Linker] [Input] - [Additional Dependencies] winmm.lib [ ] [ ] [ ] 13 Parallel Amplifier 29 / 48

Parallel Amplifier Release Parallel Amplifier Parallel Amplifier [Concurrency Where is my concurrency poor? ] [Profile] 30 / 48

Concurrency Summary Summary Hotspot Summary Wait Time I/O Wait Count API [Threads Created] [Core Count] 4 Elapsed Time 1.303s 0.491s 2.5 0.340s 3971 Summary 4 3.27 4 3.27 CPU Time / Elapsed Time Note Summary CPU 4 6 5 6 CPU 3.17 CPU 31 / 48

Concurrency Bottom-up CPU >> Function Thread Caller Function Tree CPU setqueen 4 CPU 32 / 48

setqueen() Poor Ok Ideal Parallel Amplifier Parallel Amplifier [Locks and Waits Where is my program waiting? ] [Profile] setqueen OMP Critical Note setqueen 33 / 48

nrofsolutions++ OpenMP ID solve() setqueen() #include <iostream> #include <windows.h> #include <mmsystem.h> #include "omp.h" using namespace std; int nrofsolutions=0; int size=0; // OpenMP void solve(void) { int thrd_max = omp_get_max_threads(); // int *solcnt = new int[thrd_max](); #pragma omp parallel { // // Fork int myid = omp_get_thread_num(); // ID #pragma omp for for(int i=0; i<size; i++) { // setqueen(new int[size], 0, i); setqueen(new int[size], 0, i, solcnt, myid); // // pragma omp parallel // Join ID for(int i=0; i<thrd_max; i++) { nrofsolutions += solcnt[i] ; //void setqueen(int queens[], int row, int col) { if(row==size-1) { //#pragma omp critical // nrofsolution s++ ; solcnt[id]++; else { for(int i=0; i<size; i++) { // setqueen(queens, row+1, i); // void setqueen(int queens[], int row, int col, int solcnt[], int id) { // // ID setqueen(queens, row+1, i, solcnt, id); // ID 34 / 48

void solve( void ) int thrd_max = omp_get_max_threads(); int *solcnt = new int [thrd_max] (); #pragma omp parallel // Quad Core // new ZERO Fork 2 3 4 myid ( 0 ) = omp_get_thread_num(); myid ( 1 ) = omp_get_thread_num(); myid ( 2 ) = omp_get_thread_num(); myid ( 3 ) = omp_get_thread_num(); #pragma omp for setqueen(newqueens,0,0, solcnt, myid(=0)) setqueen(newqueens,0,1, solcnt, myid(=0)) setqueen(newqueens,0,2, solcnt, myid(=0)) setqueen(newqueens,0,3, solcnt, myid(=0)) setqueen(newqueens,0,4, solcnt, myid(=1)) setqueen(newqueens,0,5, solcnt, myid(=1)) setqueen(newqueens,0,6, solcnt, myid(=1)) setqueen(newqueens,0,7, solcnt, myid(=2)) setqueen(newqueens,0,8, solcnt, myid(=2)) setqueen(newqueens,0,9, solcnt, myid(=2)) setqueen(newqueens,0,10, solcnt, myid(=3)) setqueen(newqueens,0,11, solcnt, myid(=3)) setqueen(newqueens,0,12, solcnt, myid(=3)) setqueen( ) { setqueen( ) { setqueen( ) { setqueen( ) { // solcnt[myid(=0)]++; // solcnt[myid(=1)]++; // solcnt[myid(=2)]++; // solcnt[m yid( =3)]++; Join for(int i=0; i<thrd_max; i++) { nrofsolutions += solcnt[i] ; // Parallel Amplifier setqueen() 35 / 48

Parallel Composer /O3 IDE [C/C++] [Optimization] [Optimization] /O2 /O2 > icl /O3 main.cpp IPO /Qipo IDE [C/C++] [Optimization] [Interprocedural Optimization] Release /Qx{SSE4.2 SSE4.1 SSSE3 SSE3 SSE SSE2 Host SSE IDE [C/C++] [Code Generation] [Intel Processor eci -Sp fic Optimization] main.exe SSE4.2 CPU Core i7 > icl /O2 /QxSSE4.2 main.cpp main.exe SSSE3 CPU Core i7 Core 2 Duo > icl /O2 /QxSSSE3 main.cpp /QxHost Host SSE main.exe > icl /O2 /QxHost main.cpp 36 / 48

/Qax{SSE4.2 SSE4.1 SSSE3 SSE3 SSE2 IDE [C/C++] [Code Generation] [Add Processor-Optimized Code Path] SSE2 /Qx main.exe SSE4.2 CPU Core i7 SSE2 CPU Pentium 4 SSE2 AMD > icl /O2 /QaxSSE4.2 main.cpp main.exe SSE4.2 CPU Core i7 SSE3 CPU Core 2 Duo SSE3 AMD > icl /O2 /QaxSSE4.2 /arch:sse3 main.cpp main.exe SSE4.2 CPU Core i7 IA-32 CPU Pentium 3 > icl /O2 /QaxSSE4.2 /arch:ia32 main.cpp /arch:{sse3 SSE2 IA32 IDE [C/C++] [Code Generation] [Enable Enhanced Instruction Set] main.exe SSE2 CPU Core i7 Core2Duo Pentium4 AMD Processors > icl /O2 /arch:sse2 main.cpp /Qparallel IDE [C/C++] [Optimization] [Parallelization] > icl /Qparallel main.cpp OpenMP /Qopenmp IDE [C/C++] [Language] [OpenMP* Support] OpenMP #pragma omp Openmp OpenMP libiomp5md.dll > icl /Qopenmp main-omp.cpp libiomp5mt.lib > icl /Qopenmp /Qopenmp-link:static main-omp.cpp 37 / 48

libguide40.dll > icl /Qopenmp /Qopenmp-lib:legacy main-omp.cpp libguide.lib > icl /Qopenmp /Qopenmp-lib:legacy /Qopenmp-link:static main-omp.cpp /Qopenmp-link IDE [C/C++] [Command Line] [Additional Options:] /Qopenmp-lib IDE [C/C++] [Command Line] [Additional Options:] /Qopenmp-stubs IDE [C/C++] [Command Line] [Additional Options:] OpenMP > icl /Qopenmp /Qopenmp-link:static main-omp.cpp OpenMP > icl /Qopenmp /Qopenmp-lib:legacy main-omp.cpp OpenMP OpenMP OpenMP omp_set_num_threads OpenMP /Qopenmp OpenMP > icl /Qopenmp-stubs main-omp.cpp /Qvec-report{0-5 IDE [C/C++] [Command Line] [Additional Options:] /Qpar-report{0-3 IDE [C/C++] [Command Line] [Additional Options:] /Qopenmp-report{0-2 0 5 > icl /QxSSE4.2 /Qvec-report3 main.cpp > icl /arch:sse2/qvec-report2 main.cpp 0 3 > icl /Qparallel /Qpar-report3 main.cpp OpenMP IDE [C/C++] [Command Line] [Additional Options:] 0 2 > icl /Qopenmp /Qopenmp-report2 main-omp.cpp Parallel Debugger Extension /debug:parallel Parallel Debugger Extension IDE [C/C++] [Debug] [Enable Parallel Debug Checks] /Qopenmp > icl /Qopenmp /debug:parallel main-omp.cpp 38 / 48

Parallel Lint /Qdiag-enable:sc-parallel{1 2 3 OpenMP IDE [C/C++] [Diagnostics] [Level of Source Code Parallelization Analysis] /Qopenmp OpenMP > icl /Qopenmp /debug:parallel main-omp.cpp 39 / 48

Parallel Inspector Parallel Inspector Visual Studio Memory errors [Inspect] Configure Analysis Run Analysis Intel(R) Parallel Inspector Problem Type Reference 40 / 48

Parallel Inspector /Zi /Od Release /RTC[su1] Parallel Inspector Parallel Inspector /Qtcheck /Qtprofile /debug:parallel Thread Checker Parallel Inspector Thread Profiler Parallel Inspector Parallel Debugger Extension Parallel Inspector /FIXED:NO /MDd, MD, MT, MTd /Qopenmp-link /D"TBB_USE_THREADING_TOOLS" Parallel Inspector C OpenMP 41 / 48

Parallel Amplifier Parallel Amplifier /Zi "Release" /MD /MDd C OpenMP /Qopenmp OpenMP /Qopenmp-link:dynamic OpenMP Parallel Composer /Qopenmp TBB /D"TBB_USE_THREADING_TOOLS" Parallel Amplifier TBB /Qtcheck /Qopenmp-link:static /Qtprofile /Qopenmp_stubs /debug:parallel Thread Checker Parallel Amplifier OpenMP Parallel Amplifier Thread Profiler Parallel Amplifier OpenMP Parallel Debugger Extension Parallel Amplifier /FIXED:NO Parallel Amplifier 42 / 48

Parallel Amplifier Pause Resume [ ] [Intel Parallel Amplifier Project Properties] Projetct Properties Start data collection paused Resume collection after sec. [Profile] Parallel Amplifie r [Profile] [Continue] [Continue] 43 / 48

Start data collection paused Parallel Amplifier Resume itt_resume() Pause itt_pause() #include ittnotify.h Func() { // Resume & Pause // itt_resume(); process1(); // // itt_pause(); // process2(); itt_resume(); process3(); itt_pause(); // // // // ittnotify.h C: Program Files Intel Parallel Studio Amplifier includ e C: Program Files Intel Parallel Studio Amplifier lib32 (32 ) C: Program Files (x86) Intel Parallel Studio Amplifier lib64 (64 ) ittnotify_static.lib 44 / 48

[Profile] Note Pause Resume Summary Pause Time 45 / 48

Visual Studio [ ] Windows Visual Studio* IDE [ ] F1 46 / 48

N-Queens N-Queens nq-serial OpenMP N-Queens N-Queens STL vector Win 32 API par critical OpenMP 3.0 OpenMP 3.0 Task TBB TBB + OpenMP OpenMP OpenMP* http://jp.xlsoft.com/documents/intel/compiler/525j-001.pdf C/C++ OpenMP* http://jp.xlsoft.com/documents/intel/compiler/526j-001.pdf OpenMP http://openmp.org/wp/ OpenMP Application Program Interface Version 3.0 OpenMP http://www.openmp.org/mp-documents/openmp30spec-ja.pdf TBB TBB TBB http://www.xlsoft.com/jp/products/intel/perflib/tbb/index.html TBB http://www.oreilly.co.jp/books/9784873113555/ 47 / 48

http://www.xlsoft.com/jp/products/intel/parallel/index.html https://www.xlsoft.com/jp/services/xlsoft_form.html 48 / 48