10000 SFMOPT / / MOPT(Merge OPTimization) MOPT FMOPT(Fast MOPT) FMOPT SFMOPT(Subgrouping FMOPT) SFMOPT 2 8192 31 The Proposal and Evaluation of SFMOPT, a Task Mapping Method for 10000 Tasks Haruka Asano and Kenji Kise In recent years, computing systems which adopt mesh/torus network have appeared. Their communication time differs in where the positions of calculation tasks are located in such a direct network. Especially the positions of communicated tasks are important. The problem to determine the positions of all tasks is called a task mapping problem. The task mapping problem is complex because there are a lot of combinations. Therefore, it is difficult for the programmers to seek the best task mapping. We proposed a fast task mapping method MOPT (Merge OPTimization) before. In this paper, we propose FMOPT (Fast MOPT) which shortened the optimization time of MOPT using duplicate exclusions and parallelization. We also propose SFMOPT (Subgrouping FMOPT) which further accelerated FMOPT by introducing subgrouping. By using SFMOPT, we can optimize 8192 nodes in 31 seconds which had taken 2 hours before. 1. () / / Graduate School of Information Science and Engineering, Tokyo Institute of Technology 1), Jaguar 2) / () 1 ID MPI(Message Passing Interface) ID 2 3 / MPI c 2013 Information Processing Society of Japan 38
1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOPT FMOPT FMOPT SFMOPT 2 MOPT 3 MOPT 4 5 6 2. MOPT 2.1 / 1 (d) 2 (a) (b) MOPT (c) (f) 2.2 MOPT MOPT 16 4 4 ( 2) MOPT 2 MOPT 2(a) 2(b) 0 2(c) 2 2 16 4 2 2 2 c 2013 Information Processing Society of Japan 39
2 (1) G 1,G 2 s,t T raffic(s, t) s t Comm(G 1, G 2 ) = T raffic(s, t) (1) t G 2 s G 1 Eval Cost Eval Link 2 Eval Cost ( ) Eval Link XYZ Eval Cost MOPT MOPT mincost Eval Link MOPT MOPT minlink 2 2 X Y 3 X,Y,Z 3. MOPT 3.1 MOPT MOPT MOPT 8192 MOPT minlink 99.8% 98.5% 3.2 FMOPT MOPT FMOPT(Fast MOPT) FMOPT MOPT MOPT FMOPT MOPT 3.2.1 MOPT 2 X,Y (3 X,Y,Z ) X Y (3 XY YZ ZX ) YZ X ZX Y XY Z 2 2 4 8 3 6 8 48 MOPT 8 48 MOPT MOPT mincost Eval Cost 3 MOPT 3 3 ABC 3 abc A a X (1) Eval Cost (2) (8) (1) 90 (2) (4) (1) (4) Z (5) (8) (1) (2) (8) MOPT minlink MOPT mincost MOPT minlink XYZ XYZ c 2013 Information Processing Society of Japan 40
3 Eval Cost 4 Eval Link Eval Cost MOPT mincost 4 3 (1) Eval Link 3 (1) (8) (1) (6) (1)(6) Z (5)(2)Y (7)(4)Y Z (3)(8) A A A A A ( 1 ) A 2 2 (MOPT minlink ) ( 2 ) A 4 MOPT mincost 1/8MOPT minlink 1/4 3 MOPT 2 (G 1,G 2 ) X Y Z 1 MOPT mincost 2, MOPT minlink 3 MOPT mincost 6 1 G 1 G 2 X 6 8 6 8 2304 Y 2 8 2 8 256 Z 2 8 2 8 256 2 (FMOPT mincost) G 1 G 2 X 3 2 6 8 288 1/8 Y 2 2 2 8 64 1/4 Z 1 2 2 8 32 1/8 3 (FMOPT minlink) G 1 G 2 X 6 2 6 8 576 1/4 Y 2 2 2 8 64 1/4 Z 2 2 2 8 64 1/4 7 MOPT minlink 4 2 MOPT Z Z 3.2.2 FMOPT 3.3 SFMOPT FMOPT MOPT FMOPT SFMOPT(Subgrouping MOPT) SFMOPT FMOPT SFMOPT 2 SFMOPT 4 4 2 2 5 5 2 2 (1) c 2013 Information Processing Society of Japan 41
5 2 2 5 4 4 2 2 2 2 2 2 SFMOPT 2 3 SF- MOPT FMOPT SFMOPT FMOPT SFMOPT FMOPT SFMOPT 6 2 2 2 9 4 4 4 D SFMOPT(D ) N 2 D O(1) O(1) O(N) O(N) 4. 4.1 FMOPT MOPT Intel Core i7 870 Linux gettimeofday() OpenMP MOPT MOPT MOPT( FMOPT) 6 6 FMOPT MOPT 2 3 FMOPT minlink 4 FMOPT mincost 5 8 FMOPT mincost 128512 8192 2 MOPT FMOPT 7 7 4 23 mincost FMOPT 8 8 1024, 8192 512 1024 4096 8192 c 2013 Information Processing Society of Japan 42
6 7 8 FMOPT 10000 8192 Eval Link MOPT 7348 4 FMOPT 547 4.2 SFMOPT SFMOPT 2 1 FMOPT 1 3 / XYZ XYZ None FMOPT FMOPT SFMOPT SFMOPT N 2 2 2 FMOPT-half N FMOPT None FMOPT-half 2 1 FMOPT-half N N SFMOPT FMOPT-half 2 SFMOPT N SFMOPT FMOPT N FMOPT-half 4.2.1 FMOPT 9 9 Eval Link FMOPT, SFMOPT, FMOPT-half FMOPT SFMOPT 9 9(512 ) 6 6 10000 8192 FMOPT 547 SFMOPT 31 SFMOPT FMOPT-half FMOPT-half c 2013 Information Processing Society of Japan 43
9 10 (minlink) (mincost) SFMOPT 8192 SFMOPT FMOPT-half 1.1 FMOPT-half SFMOPT 10 Eval Cost FMOPT, SFMOPT, FMOPT-half Eval Link SF- MOPT FMOPT FMOPT-half Eval Cost Eval Link Eval Link Eval Link 4.2.2 NAS Parallel Benchmark CGBT MGSP 512 6 OpenNSIM 4) 512 11 Eval Link 12 Eval Cost 10000 Eval Link 11 Eval Cost 12 XYZ 1 2 13 Eval Link MG SFMOPT FMOPT XYZ SP FMOPT SFMOPT Eval Link 1 c 2013 Information Processing Society of Japan 44
MG FMOPTSFMOPT MG 3 XYZ XYZ MOPT 14 Eval Cost MG Eval Link FMOPT SFMOPT Eval Cost MG Eval Link XYZ 1 Eval Link 13 Eval Cost 14 MOPT MG 13 Eval Link MG FMOPTSFMOPTFMOPThalfNoneXYZ SFMOPT FMOPT-half SF- MOPT FMOPT 5% SFMOPT 14 Eval Cost SFMOPT FMOPT-halfNoneXYZ Eval Link CG SP SFMOPT FMOPT CG SP Eval Cost Eval Link Eval Link FMOPT SFMOPT 5. 13 (minlink) 14 (mincost) RMATT 5) RMATT BISEM Simulated annealing OPTIM BISEM SFMOPT BISEM SFMOPT BISEM SFMOPT 1 1 BISEM SFMOPT RMATT Simulated annealing RMATT RMATT 6) Simulated annealing 2 SFMOPT c 2013 Information Processing Society of Japan 45
RMATT BISEM SFMOPT 3 BlueGene/L 7) SFMOPT SFMOPT TAC3 8) RMATT Simulated annealing TAC3 SFMOPT 6. MOPT 4 4 3 5% SFMOPT MOPT 1% 16384 SFMOPT minlink 32%SF- MOPT mincost 67% MOPT FMOPT 512 SFMOPT minlink 0.4 6 BluGene/Q 5 3 SFMOPT 2 3 MOPT (CREST) 1) Yokokawa, M., Shoji, F., Uno, A., Kurokawa, M. and Watanabe, T.: The K computer: Japanese next-generation supercomputer development project, International Symposium on Low Power Electronics and Design (ISLPED), pp. 371 372 (2011). 2) Bland, A. S., Kendall, R. A., Kothe, D. B., Rogers, J. H. and Shipman, G. M.: Jaguar: The world s most powerful computer, Cray User Group (2009). 3),, : /, D, Vol. J96-D, No. 2, pp. 269 279 (2013). 4),,,,,,,, : OpenNSIM, 2010-ARC-192(15), pp. 1 9 (2010). 5),,, : RMATT, SACSIS2011, pp. 340 347 (2011). 6),,,,,, : RMATT, 2012, pp. 93 100 (2012). 7) Bhanot, G., Gara, A., Heidelberger, P., Lawless, E., Sexton, J. and Walkup, R.: Optimizing task layout on the Blue Gene/L supercomputer, IBM Journal of Research and Development, Vol. 49, No. 2.3, pp. 489 500 (2005). 8), : /, 2013, pp. 95 103 (2013). c 2013 Information Processing Society of Japan 46