Parallel Studio エクセルソフト株式会社 www.xlsoft.com Rev. 1.1 (2010/04/08) 1 / 48
... 3 Parallel Studio... 3... 4... 5... 6 Parallel Composer... 8 Parallel Amplifier... 12 Parallel Composer... 16 Parallel Composer... 19 Parallel Inspector... 23 Parallel Composer... 27 Parallel Amplifier... 30... 36 Parallel Composer... 36... 36 Parallel Inspector... 40... 40... 41 Parallel Amplifier... 42...42... 43... 46... 46 N-Queens... 47 OpenMP... 47 TBB... 47... 48 2 / 48
Parallel Studio Parallel Studio Parallel Studio Parallel Studio Parallel Studio Microsoft* Visual Studio* C/C++ 3 Parallel Composer Parallel Composer C++ IA-32 Intel 64 ( IPP) ( TBB) Parallel Debugger Extension OpenMP Parallel Inspector Win 32 API OpenMP TBB Parallel Amplifier 3 / 48
Parallel Studio Visual Studio Parallel Amplifier Parallel Inspector Parallel Debugger Extension Parallel Composer C++ Parallel Composer Parallel Studio 3 Parallel Composer Parallel Amplifier Parallel Inspector Intel(R) Core(TM)2 Quad CPU Q6600 2.4GHz OS Microsoft Windows Vista* Business X86 IDE Microsoft Visual Studio 2008 Team System VS2008 4 / 48
Parallel Amplifier CPU Parallel Composer Parallel Composer Parallel Debugger Extension Parallel Inspector Parallel Amplifier Parallel Amplifier Parallel Composer Parallel Inspector Parallel Amplifier 5 / 48
Parallel Composer N-Queens N-Queens Zip C: Program Files Intel Parallel Studio Composer Samples en_us C++ NQueens.zip Zip nq-serial NQueens nq-serial nq-serial.cpp N-Queens Queen Queen Queen Queen N-Queens 1850 1969 Parallel Composer N-Queens N-Queens nq-serial nq-serial.cpp main() N N main() N-Queens solve() timegettime() solve() setqueen() N i Queen 0 i setqueen() setqueen() Queen setqueen() nrofsolutions 6 / 48
int main(int argc, char*argv[]) { if(argc!=2) { cerr << "Usage: nq-serial boardsize [default is 8]. n"; size = 8; else { size = atoi(argv[1]); // N cout << "Starting serial recursive solver for size " << size << "... n"; DWORD starttime=timegettime(); solve(new int[size]); // N-Queens DWORD endtime=timegettime(); cout << "Number of solutions: " << nrofsolutions << endl; cout << "Calculations took " << endtime-starttime << "ms. n"; return 0; void solve(int queens[]) { for(int i=0; i<size; i++) { // try all positions in first row // create separate array for each recursion setqueen(queens, 0, i); // N void setqueen(int queens[], int row, int col) { for(int i=0; i<row; i++) { // vertical attacks if (queens[i]==col) { return; // diagonal attacks if (abs(queens[i]-col) == (row-i) ) { return; // column is ok, set the queen queens[row]=col; if(row==size-1) { nrofsolutions++; // N-Queens else { // try to fill next row for(int i=0; i<size; i++) { setqueen(queens, row+1, i); // 7 / 48
Parallel Composer VS2008 Parallel Composer C++ 1. Windows [ ] VS2008 Microsoft Windows Vista [ ] Note [ ] [Intel Parallel Studio] [Intel Parallel Studio with VS 2008] 2. Win32 nq-parallelize 8 / 48
3. nq-serial.cpp 4. Debug Release VS2008 VC++ timegettime() winmm.lib 5. 9 / 48
6. 13 13 7. VS2008 [ ] - [ ] 8. Parallel Composer C++ C++ VC++ C++ VC++ 10 / 48
9. C++ 10. C++ 11. VS2008 [ ] - [ ] VC++ C++ 11 / 48
Parallel Amplifier Parallel Amplifier Parallel Amplifier nq-parallelize.exe Parallel Amplifier Parallel Amplifier [ Hotspot Where is my program spending time? ] [Profile] [Profile] Parallel Amplifier 12 / 48
Hotspots Call Stack Summary Summary Elapsed Time CPU Time CPU CPU CPU I/O Unused CPU Time CPU Intel(R) Core(TM)2 Quad CPU Core Count Threads Created Call Stack Hotspot Hotspot 9 13 / 48
Note Call Stack main solve IPO Hotspots Bottom-up Top-down Tree Bottom-up Hotsport Top-down Tree Hotsport 14 / 48
Hotspots Hotspot setqueen setqueen CPU 107 CPU 89 for CPU setqueen setqueen solve Note Visual Studio 15 / 48
Parallel Composer Parallel Studio 3 Win 32 API OpenMP Threading Building Blocks TBB Win 32 API CreateThread _beginthread Win 32 API OpenMP OpenMP #pragma Win 32 API OpenMP OpenMP #pragma #pragma OS OpenMP Fork-Join TBB C++ TBB STL TBB TBB OpenMP C++ 16 / 48
solve OpenMP OpenMP OpenMP #pragma omp parallel { printf printf OS for #pragma omp for For( i=0; i<n; i++ ) // 0 N { void solve(int queens[]) { #pragma omp parallel for for(int i=0; i<size; i++) { // try all positions in first row // create separate array for each recursion setqueen(queens, 0, i); size setqueen size 13 i=0 12 4 Intel Core2 Quad Core 1 i= 0 3 2 4 6 3 7 9 4 10 12 4 setqueen 17 / 48
solve( int queens[]) #pragma omp parallel for Fork setqueen(queens,0,0) setqueen(queens,0,1) setqueen(queens,0,2) setqueen(queens,0,3) setqueen(queens,0,4) setqueen(queens,0,5) setqueen(queens,0,6) setqueen(queens,0,7) setqueen(queens,0,8) setqueen(queens,0,9) setqueen(queens,0,10) setqueen(queens,0,11) setqueen(queens,0,12) Join #pragma omp parallel for for OpenMP #pragma omp C++ /Qopenmp nq-parallelize OpenMP N-Queens 13 73712 OpenMP 18 / 48
Parallel Composer Parallel Composer Parallel Debugger Extension Parallel Debugger Extension Microsoft Visual C++ OpenMP OpenMP SSE Parallel Debugger Extension C++ /Qopenmp /debug=parallel Parallel Debugger Extension Debug [ ] [C/C++] [Language] [OpenMP Support] /Qopenmp [ ] [C/C++] [Debug] [Enable Parallel Debug Checks] Yes (/debug:parallel) Release winmm.lib 13 Parallel Debugger Extension VS2008 [ ] - [Intel Parallel Debugger Extension] [Thread Data Sharing Detection] [Enable Detection] Parallel Debugger Extension ON OFF 19 / 48
[ Thread Data Sharing Events ] ON OFF [ Thread Data Sharing Filters ] VS2008 [ ] - [Intel Parallel Debugger Extension] [Windows] [Thread Data Sharing Events] Thread Data Sharing Events VS2008 [ ] - [ ] Parallel Debugger Extension Thread Data Sharing Events 0x003d6138 4700 7140 7860 read/write 0x003d6138 4 3 3 "nq-serial.cpp 95 0x003d6138 3 4700 7140 7860 82 void setqueen(int queens[], int row, int col) {... 93 94 // column is ok, set the queen 95 queens[row]=col; // 96. 20 / 48
queens [F5] queens Parallel Debugger Extension Thread Data Sharing Events queens VS2008 [ ] - [Intel Parallel Debugger Extension] [Windows] [Thread Data Sharing Filters] Thread Data Sharing Filters 0x003d6138 4 queens int 4 main() size = 13 52 Modify Data Range Filter [Byte Count] 4 52 [OK] 21 / 48
[F5] nrofsolutions 98 nrofsolutions++ nrofsolutions queens nrofsolutions Parallel Inspector Parallel Debugger Extension VS Parallel Inspector 22 / 48
Parallel Inspector Parallel Inspector Parallel Debugger Extension /debug:parallel [ ] [ ] [ ] 13 8 Parallel Inspector VS2008 [ ] - [Intel Parallel Inspector] [Inspect Threading Errors] Threading errors [Inspect] Configure Analysis Where are all the threading problems Inspector can find? 23 / 48
[Run Analysis] Event Log [Interpret Result] Overview Overview Problem Sets Observations in Problem Set 2 Problem Sets Observations in Problem Set Problem Sets Data race 2 P1 Observations in Problem Set ID 24 / 48
X7 X8 nq-serial.cpp 95 setqueen() Write Observations in Problem Set P1 queens Observations in Problem Set ID X7 Sources 25 / 48
X7 Focused observation Related observation Sources Related observation Observations in Problem Set Set as Focus Observation Focused observation Parallel Inspector 2 P1 queens P2 nrofsolutions 2 Note Parallel Inspector Parallel Debugger Extension Parallel Inspector OpenMP Win 32 API TBB Parallel Debugger Extension OpenMP Parallel Inspector Parallel Debugger Extension 26 / 48
Parallel Composer queens nrofsolutions OpenMP OpenMP queens nrofsolutions int nrofsolutions=0; nrofsolutions OpenMP int main() { solve( new int[size] ); solve( int queens[]) { #pragma omp parallel for for(int i=0; i<size; i++) { setqueen(queens, 0, i); A void setqueen(int queens[], int row, int col) { // column is ok, set the queen queens[row]=col; queens[] main OpenMP B void setqueen(int queens[], int row, int col) { // column is ok, set the queen queens[row]=col; if(row==size-1) { nrofsolutions++; if(row==size-1) { nrofsolutions++; queens[] setqueen() Queen setqueen() ID setqueen() queens[] solve() for setqueen queens[] nrofsolutions setqueen() Queen nrofsolutions++ 27 / 48
OpenMP critical nrofsolutions++ int main(int argc, char*argv[]) { cout << "Starting serial recursive solver for size " << size << "... n"; DWORD starttime=timegettime(); // solve(new int[size]); solve(); main() queens[] DWORD endtime=timegettime(); //void solve(int queens[]) { void solve(void) { #pragma omp parallel for for(int i=0; i<size; i++) { // try all positions in first row // create separate array for each recursion // setqueen(queens, 0, i); setqueen(new int[size], 0, i); void setqueen(int queens[], int row, int col) { // column is ok, set the queen queens[row]=col; setqueen() queens[] if(row==size-1) { #pragma omp critical nrofsolutions++; else { // try to fill next row for(int i=0; i<size; i++) { setqueen(queens, row+1, i); nrofsolutions 28 / 48
Debug Parallel Inspector Release [ ] [C/C++] [Debug] - [Enable Parallel Debug Checks] No [ ] [C/C++] [Language] - [OpenMP Support] Generate Parallel Code (/openmp equiv. to /Qopenmp) [ ] [Linker] [Input] - [Additional Dependencies] winmm.lib [ ] [ ] [ ] 13 Parallel Amplifier 29 / 48
Parallel Amplifier Release Parallel Amplifier Parallel Amplifier [Concurrency Where is my concurrency poor? ] [Profile] 30 / 48
Concurrency Summary Summary Hotspot Summary Wait Time I/O Wait Count API [Threads Created] [Core Count] 4 Elapsed Time 1.303s 0.491s 2.5 0.340s 3971 Summary 4 3.27 4 3.27 CPU Time / Elapsed Time Note Summary CPU 4 6 5 6 CPU 3.17 CPU 31 / 48
Concurrency Bottom-up CPU >> Function Thread Caller Function Tree CPU setqueen 4 CPU 32 / 48
setqueen() Poor Ok Ideal Parallel Amplifier Parallel Amplifier [Locks and Waits Where is my program waiting? ] [Profile] setqueen OMP Critical Note setqueen 33 / 48
nrofsolutions++ OpenMP ID solve() setqueen() #include <iostream> #include <windows.h> #include <mmsystem.h> #include "omp.h" using namespace std; int nrofsolutions=0; int size=0; // OpenMP void solve(void) { int thrd_max = omp_get_max_threads(); // int *solcnt = new int[thrd_max](); #pragma omp parallel { // // Fork int myid = omp_get_thread_num(); // ID #pragma omp for for(int i=0; i<size; i++) { // setqueen(new int[size], 0, i); setqueen(new int[size], 0, i, solcnt, myid); // // pragma omp parallel // Join ID for(int i=0; i<thrd_max; i++) { nrofsolutions += solcnt[i] ; //void setqueen(int queens[], int row, int col) { if(row==size-1) { //#pragma omp critical // nrofsolution s++ ; solcnt[id]++; else { for(int i=0; i<size; i++) { // setqueen(queens, row+1, i); // void setqueen(int queens[], int row, int col, int solcnt[], int id) { // // ID setqueen(queens, row+1, i, solcnt, id); // ID 34 / 48
void solve( void ) int thrd_max = omp_get_max_threads(); int *solcnt = new int [thrd_max] (); #pragma omp parallel // Quad Core // new ZERO Fork 2 3 4 myid ( 0 ) = omp_get_thread_num(); myid ( 1 ) = omp_get_thread_num(); myid ( 2 ) = omp_get_thread_num(); myid ( 3 ) = omp_get_thread_num(); #pragma omp for setqueen(newqueens,0,0, solcnt, myid(=0)) setqueen(newqueens,0,1, solcnt, myid(=0)) setqueen(newqueens,0,2, solcnt, myid(=0)) setqueen(newqueens,0,3, solcnt, myid(=0)) setqueen(newqueens,0,4, solcnt, myid(=1)) setqueen(newqueens,0,5, solcnt, myid(=1)) setqueen(newqueens,0,6, solcnt, myid(=1)) setqueen(newqueens,0,7, solcnt, myid(=2)) setqueen(newqueens,0,8, solcnt, myid(=2)) setqueen(newqueens,0,9, solcnt, myid(=2)) setqueen(newqueens,0,10, solcnt, myid(=3)) setqueen(newqueens,0,11, solcnt, myid(=3)) setqueen(newqueens,0,12, solcnt, myid(=3)) setqueen( ) { setqueen( ) { setqueen( ) { setqueen( ) { // solcnt[myid(=0)]++; // solcnt[myid(=1)]++; // solcnt[myid(=2)]++; // solcnt[m yid( =3)]++; Join for(int i=0; i<thrd_max; i++) { nrofsolutions += solcnt[i] ; // Parallel Amplifier setqueen() 35 / 48
Parallel Composer /O3 IDE [C/C++] [Optimization] [Optimization] /O2 /O2 > icl /O3 main.cpp IPO /Qipo IDE [C/C++] [Optimization] [Interprocedural Optimization] Release /Qx{SSE4.2 SSE4.1 SSSE3 SSE3 SSE SSE2 Host SSE IDE [C/C++] [Code Generation] [Intel Processor eci -Sp fic Optimization] main.exe SSE4.2 CPU Core i7 > icl /O2 /QxSSE4.2 main.cpp main.exe SSSE3 CPU Core i7 Core 2 Duo > icl /O2 /QxSSSE3 main.cpp /QxHost Host SSE main.exe > icl /O2 /QxHost main.cpp 36 / 48
/Qax{SSE4.2 SSE4.1 SSSE3 SSE3 SSE2 IDE [C/C++] [Code Generation] [Add Processor-Optimized Code Path] SSE2 /Qx main.exe SSE4.2 CPU Core i7 SSE2 CPU Pentium 4 SSE2 AMD > icl /O2 /QaxSSE4.2 main.cpp main.exe SSE4.2 CPU Core i7 SSE3 CPU Core 2 Duo SSE3 AMD > icl /O2 /QaxSSE4.2 /arch:sse3 main.cpp main.exe SSE4.2 CPU Core i7 IA-32 CPU Pentium 3 > icl /O2 /QaxSSE4.2 /arch:ia32 main.cpp /arch:{sse3 SSE2 IA32 IDE [C/C++] [Code Generation] [Enable Enhanced Instruction Set] main.exe SSE2 CPU Core i7 Core2Duo Pentium4 AMD Processors > icl /O2 /arch:sse2 main.cpp /Qparallel IDE [C/C++] [Optimization] [Parallelization] > icl /Qparallel main.cpp OpenMP /Qopenmp IDE [C/C++] [Language] [OpenMP* Support] OpenMP #pragma omp Openmp OpenMP libiomp5md.dll > icl /Qopenmp main-omp.cpp libiomp5mt.lib > icl /Qopenmp /Qopenmp-link:static main-omp.cpp 37 / 48
libguide40.dll > icl /Qopenmp /Qopenmp-lib:legacy main-omp.cpp libguide.lib > icl /Qopenmp /Qopenmp-lib:legacy /Qopenmp-link:static main-omp.cpp /Qopenmp-link IDE [C/C++] [Command Line] [Additional Options:] /Qopenmp-lib IDE [C/C++] [Command Line] [Additional Options:] /Qopenmp-stubs IDE [C/C++] [Command Line] [Additional Options:] OpenMP > icl /Qopenmp /Qopenmp-link:static main-omp.cpp OpenMP > icl /Qopenmp /Qopenmp-lib:legacy main-omp.cpp OpenMP OpenMP OpenMP omp_set_num_threads OpenMP /Qopenmp OpenMP > icl /Qopenmp-stubs main-omp.cpp /Qvec-report{0-5 IDE [C/C++] [Command Line] [Additional Options:] /Qpar-report{0-3 IDE [C/C++] [Command Line] [Additional Options:] /Qopenmp-report{0-2 0 5 > icl /QxSSE4.2 /Qvec-report3 main.cpp > icl /arch:sse2/qvec-report2 main.cpp 0 3 > icl /Qparallel /Qpar-report3 main.cpp OpenMP IDE [C/C++] [Command Line] [Additional Options:] 0 2 > icl /Qopenmp /Qopenmp-report2 main-omp.cpp Parallel Debugger Extension /debug:parallel Parallel Debugger Extension IDE [C/C++] [Debug] [Enable Parallel Debug Checks] /Qopenmp > icl /Qopenmp /debug:parallel main-omp.cpp 38 / 48
Parallel Lint /Qdiag-enable:sc-parallel{1 2 3 OpenMP IDE [C/C++] [Diagnostics] [Level of Source Code Parallelization Analysis] /Qopenmp OpenMP > icl /Qopenmp /debug:parallel main-omp.cpp 39 / 48
Parallel Inspector Parallel Inspector Visual Studio Memory errors [Inspect] Configure Analysis Run Analysis Intel(R) Parallel Inspector Problem Type Reference 40 / 48
Parallel Inspector /Zi /Od Release /RTC[su1] Parallel Inspector Parallel Inspector /Qtcheck /Qtprofile /debug:parallel Thread Checker Parallel Inspector Thread Profiler Parallel Inspector Parallel Debugger Extension Parallel Inspector /FIXED:NO /MDd, MD, MT, MTd /Qopenmp-link /D"TBB_USE_THREADING_TOOLS" Parallel Inspector C OpenMP 41 / 48
Parallel Amplifier Parallel Amplifier /Zi "Release" /MD /MDd C OpenMP /Qopenmp OpenMP /Qopenmp-link:dynamic OpenMP Parallel Composer /Qopenmp TBB /D"TBB_USE_THREADING_TOOLS" Parallel Amplifier TBB /Qtcheck /Qopenmp-link:static /Qtprofile /Qopenmp_stubs /debug:parallel Thread Checker Parallel Amplifier OpenMP Parallel Amplifier Thread Profiler Parallel Amplifier OpenMP Parallel Debugger Extension Parallel Amplifier /FIXED:NO Parallel Amplifier 42 / 48
Parallel Amplifier Pause Resume [ ] [Intel Parallel Amplifier Project Properties] Projetct Properties Start data collection paused Resume collection after sec. [Profile] Parallel Amplifie r [Profile] [Continue] [Continue] 43 / 48
Start data collection paused Parallel Amplifier Resume itt_resume() Pause itt_pause() #include ittnotify.h Func() { // Resume & Pause // itt_resume(); process1(); // // itt_pause(); // process2(); itt_resume(); process3(); itt_pause(); // // // // ittnotify.h C: Program Files Intel Parallel Studio Amplifier includ e C: Program Files Intel Parallel Studio Amplifier lib32 (32 ) C: Program Files (x86) Intel Parallel Studio Amplifier lib64 (64 ) ittnotify_static.lib 44 / 48
[Profile] Note Pause Resume Summary Pause Time 45 / 48
Visual Studio [ ] Windows Visual Studio* IDE [ ] F1 46 / 48
N-Queens N-Queens nq-serial OpenMP N-Queens N-Queens STL vector Win 32 API par critical OpenMP 3.0 OpenMP 3.0 Task TBB TBB + OpenMP OpenMP OpenMP* http://jp.xlsoft.com/documents/intel/compiler/525j-001.pdf C/C++ OpenMP* http://jp.xlsoft.com/documents/intel/compiler/526j-001.pdf OpenMP http://openmp.org/wp/ OpenMP Application Program Interface Version 3.0 OpenMP http://www.openmp.org/mp-documents/openmp30spec-ja.pdf TBB TBB TBB http://www.xlsoft.com/jp/products/intel/perflib/tbb/index.html TBB http://www.oreilly.co.jp/books/9784873113555/ 47 / 48
http://www.xlsoft.com/jp/products/intel/parallel/index.html https://www.xlsoft.com/jp/services/xlsoft_form.html 48 / 48