Vol. 42 No. 4 Apr. 2001 VC 2 VC 4 VC VC 4 Recover-x Performance Evaluation of Adaptive Routers Based on the Number of Virtual Channels and Operating Frequencies Maki Horita, Tsutomu Yoshinaga, Kanemitsu Ootsu and Takanobu Baba In order to improve the communication performance of the parallel computer network,we should evaluate the various routing algorithms. Adaptive routing or virtual channels (VCs) can improve communication performance by increasing routing flexibility. However,the operating frequencies of the router become degraded,since the adaptive routing and the VCs require a complex and huge amount of hardware resources. Therefore,it is important to consider the trade-off between the routing flexibility and the operating frequency. We clarify this trade-off by evaluating the communication performance in 2D tori network for typical routers, taking into account the operating frequency of the routing circuits. Our experimental results show that the routers with four VCs per physical channel attain a good trade-off between routing flexibility and operating frequency. Adaptive routers show higher performance than non-adaptive routers due to their higher routing flexibility,especially in the case of a nonuniform communication pattern. The Recover-x router with four VCs per physical channel shows robust performance both in uniform and non-uniform traffic. 1. 1 Department of Information Science, Faculty of Engineering, Utsunomiya University Graduate School of Information Systems, University of Electro-Communications 4),6),8) VC 4 VC 9) VC 3) HDL 714
Vol. 42 No. 4 715 3 Dimension-order -channel 2) Recover-x 11) Recover-x DISHA 1) Recover-x DISHA VC 100 VC HDL 4 VC 2 3 4 5 RTL 6 2. 2.1 2 2.2 3 2.2.1 Dimension-order Dimension-order 2 X Y X Y 2.2.2 -channel -channel / VC -channel Duato 5) 2.2.3 Recover-x Recover-x Y X -channel VC DISHA Recover-x 3. 3.1 1 1 (a) 4 ± X ± Y Port PE I/F 1 (b)
716 Apr. 2001 1 VC VC Table 1 The number of VCs for each VC configuration. X Y PE I/F VC3 3 3 2 14 VC4 4 4 2 18 VC5 5 5 2 22 =2 X +2 Y + PE I/F VCC 3.2 1 1 Fig. 1 Hardware organization of the routers. VC BC Buffer Controller AD Address Decoder BC 8 FIFO (1) VC BC AD (2) AD Recover-x (3) OCA Output Channel Arbiter VC VC ( 4 ) OCA AD BC VCC Virtual Channel output Controller VC VC VC0 3 4 5 VC VC3 VC4 VC5 PE I/F 2 VC 2 3 4 VC VC VC 3.2.2 VC VC VC node#a node#b node#b node#c node#a node#c node#a 2 X Y 3.2.1 2 Dimension-order X Y 3 VC VC Dimension-order VC 3 -channel VC3 VC / -channel 1 2 / VC 4 Recover-x X VC 1 2 Y
Vol. 42 No. 4 717 VC 3.2.2 VC VC VC VC 7) Dimension-order VC -channel Recover-x 2 Dimension-order VC Fig. 2 VC assignment for a Dimension-order router. 3 -channel VC Fig. 3 VC assignment for a -channel router. VC AD OCA VC VC VC3 4 4 5 VC -channel Recover-x VC 3.3 1 32 1 2 VC 1 10) 4. 4.1 3 Verilog-HDL 4 Recover-x VC Fig. 4 VC assignment for a Recover-x router. -channel Synopsys HDL Compiler version 1999.05 Medium effort LSI Logic 0.6 µm Array- Based Gate Array Verilog-HDL
718 Apr. 2001 2 Table 2 Synthesis results. Dimension-order -channel Recover-x VCs/port 3 4 5 3 4 5 3 4 5 MHz 161.2 156.2 147.0 120.4 114.9 107.5 142.8 133.3 117.6 Kgates 70.9 90.2 109.1 72.2 96.5 120.9 75.6 94.5 118.2 Kgates 40.0 51.6 63.0 43.1 60.2 78.7 43.1 58.2 76.1 Kgates 110.9 141.8 172.1 115.3 156.7 199.6 118.7 152.7 194.3 4.2 2 2 2 4.2.1 VC Recover-x -channel VC Dimension-order VC VC VC VC3 4 VC4 5 VC VC3 VC5 OCA VC3 VC OCA 4.2.2 VC Recover-x 3 Recover-x Kgates Table 3 Each block area for Recover-x routers (Kgates). VC3 VC4 VC5 ±X ±Y ±X ±Y ±X ±Y AD 0.43 0.82 0.67 1.03 0.92 1.32 BC 13.11 13.44 17.57 18.29 21.29 21.37 OCA 2.15 1.01 3.17 2.00 4.81 2.86 VCC 0.21 0.22 0.40 0.36 0.44 0.47 15.89 15.47 21.80 21.68 27.44 26.01 -channel Dimension-order 1 Recover-x -channel VC3 Recover-x -channel VC4 /5 VC -channel VC VC Dimension-order VC VC 3 Recover-x BC AD OCA VC X AD Y OCA AD BC X Y OCA VC X
Vol. 42 No. 4 719 5. 5.1 Cadence Verilog-XL 10 10 = 100 Hot-spot 100 25% (4,j) 0 j 9 10 Random 100 1 1 3 1 4 PE PE PE I/F Recover-x VC 4 5.2 100 MHz 2 2000 5000 5.3 Hot-spot 5.3.1 5 (a) 100 MHz Hot-spot 4 256 Recover-x -channel Dimension-order Hot-spot 2 2 32 1 Dimension-order -channel VC -channel Dimension-order VC3 4 Recover-x VC3 4 5 VCC VC Recover-x -channel VC5 VC3 4 5 (b) Dimensionorder VC4 256 3 -channel 64 VC4 64 128
720 Apr. 2001 Fig. 5 5 Hot-spot Bandwidth for the Hot-spot traffic. 6 Fig. 6 Hot-spot Latency for the Hot-spot traffic. VC5 128 VC5 VC4 VC4 VC5 Recover-x VC3 Hot-spot VC VC5 4 5.3.2 6 Hot-spot
Vol. 42 No. 4 721 64 PE PE 6(a) -channel Recover-x Recover-x VC 3 -channel 1 VC3 6(b) Recover-x -channel Dimension-order -channel Dimension-order -channel Recover-x VC5 Dimension-order -channel VC3 VC4 5 VC4 5 Recover-x -channel Dimension-order Recover-x VC3 4 5 5.4 Random 5.4.1 7 Random Random 7 (a) 100 MHz Hot-spot 32 Hot-spot Dimension-order -channel VC VC 7 (b) -channel Dimension-order Recover-x VC4 5.4.2 8 Random 8 (a) 100 MHz -channel VC -channel Recover-x Dimension-order 6(a) Random VC VC 8 (b) Dimension-order -channel Recover-x VC VC5 VC4 Recover-x 3 VC4 6. Dimension-order -channel Recoverx 3 VC VC3 4
722 Apr. 2001 Fig. 7 7 Random Bandwidth for the Random traffic. 8 Fig. 8 Random Latency for the Random traffic. 4 5 Recover-x VC4 /
Vol. 42 No. 4 723 CAD B 10558039 A 11780190 1) Anjan, K.V. and Pinkston, T.M.: An Efficient, Fully Adaptive Deadlock Recovery Scheme: DISHA, Proc. 22nd ISCA, pp.201 210 (1995). 2) Berman, P.E., Gravano, L., Pifarré, G.D. and Sanz, J.L.C.: Adaptive Deadlock and Livelock Free Routing with all Minimal Paths in Torus Networks, Proc. SPAA (1992). 3) Chien, A.A.: A Cost and Speed Model for k- ary n-cube Wormhole Routers, IEEE Trans. Parallel and Distributed Systems, Vol.9, No.2, pp.150 162 (1998). 4) Dai, D. and Panda, D.K.: How Much Does Network Contention Affect Distributed Shared Memory Performance?, Proc.ICPP 97, pp.454 461 (1997). 5) Duato, J.: A New Theory of Deadlock-Free Adaptive Routing in Wormhole Network, IEEE Trans. Parallel and Distributed Systems, Vol.4, No.12, pp.1320 1331 (1993). 6) Flich, J., Malumbres, M.P., López, P. and Duato, J.: Performance Evaluation of Networks of Workstations with Hardware Shared Memory Model Using Execution-Driven Simulation, Proc. ICPP 99 (1999). 7) JSPP2000 pp.189 196(2000). 8) adaptive routing Proc. HOKKE2000, 2000-ARC-137-9, pp.47 52 (2000). 9) Vaidya, A.S., Sivasubramaniam, A. and Das, C.R.: LAPSES: A Recipe for High Performance Adaptive Router Design, Proc. HPCA- 5 99 (1999). 10) Vol.40, No.5, pp.1958 1967 (1999). 11) Recover-x Vol.41, No.5, pp.1360 1369 (2000). ( 12 8 31 ) ( 13 3 9 ) 1999 1986 1988 1997 2000 8 IEEE 1993 1995 1997 1970 1975 1982 1 IEEE 1992 Best Author Microprogrammable Parallel Computer MIT Press