2011/5/11 NAIST CPU CPU
4 (UNIX)# (Windows)#...
# (1U, 2U, 4U etc.)# (E-ATX, micro-atx, mini-itx etc.)# #
#...# BIOS ROM
OS# CD, DVD# n
#...#
# Bernoulli model: p Gilbert-Elliott model: G: good state# B: bad state# 1-k: error rate in good state# 1-h: error rate in bad state# p, r: transition probability Gilbert-Elliott model.# Source: G. Haßlinger and O. Hohlfeld, MMB 2008
: RAS# (Reliability)# MTBF: Mean time between failure (Serviceability)# MTTR: Mean time to repair (Availability)# A = MTBF / (MTBF + MTTR)# MTBF MTTR 18 CPU# # EMI
duplex system# # :OS : OS : dual system# 2 # Source: Supermicro SuperBlade 3+1 redundant power supply modules Source:
: RAID# 21 RAID: Redundant Array of Inexpensive Disks# # # # concatenation# disk 1: block 1... block N# disk 2: block N+1... block 2N# RAID-0: striping# 22 ( k )# disk 1: block 1, k+1, 2k+1,...# disk 2: block 2, k+2, 2k+2,...# disk k: block k, 2k, 3k,...# + # + # -
RAID-5: bitwise parity# 23 disk 1: block 1, k+1, 2k+1,...# disk 2: block 2, k+2, 2k+2,...#...# parity disk: parity(1..k), parity(k+1..2k),...# + :# disk 1: 1001# disk 2: 0101# disk 3: 1000# parity: 0100# RAID-5: bitwise parity# 24 - # write-ahead logging#
RAID-1: mirroring# 25 # + # - I/O k parity bit# # # ECC: Error Correction Code# IBM ChipKill, Sun Extended ECC # # Source: MIPS R4000 Microprocessor User Manual
CPU # IEEE 802.3ad (Link Aggregation)# Link Aggregation Switch Port 1 Port 2 NIC 1 NIC 2 Computer *NIC: Network Interface Card
ILOM (integrated lights-out management)# #.. 30 Source: APC white paper #44# Source: HP blade server bh7800 installation guide
Source:# Source: Overview of Liquid Cooling Systems,# SANYO DENKI Technical Report No18 Nov. 2004 LBL#
poll Watchdog timer response Target control Watchdog P1 R P2 Request Routing P3
(Replication)# (Redundant coding)# (LDPC)# Unit test: Regression test: Fuzz test:
(O'Gorman 1983)# α(alpha particle) - (neutron) - T. Karnik et al., "Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes", IEEE TDSC 1(2), 2004.# Radiation hardening# (Electromagnetic interference)# : http://www.nhtsa.gov/ua# # Failover test# Fault injection test#
# # # # RAS (MTBF, MTTR)# MTBF# MTTR#