HPC HP XC HP ISS HPC 2008 Hewlett-Packard Development Company, L.P.
2
30 2007 11 TOP10 Top500 10 1 Lawrence Livermore National Laboratory BlueGene/L IBM HP 151 LINPACK TFLOPS 478.2 2 Forschungszentrum Juelich BlueGene/P IBM 167.3 6 NNSA/Sandia National Laboratories HP BL460c Xeon XC TATA 7 Oak Ridge National SONS Laboratory TOP500 4 3 3 4 5 8 9 10 New Mexico Computing Applications Center Computational Research Laboratories IBM Thomas J. Watson Research Center National Energy Research Scientific Computing Center Stony Brook/BNL, New York Center for Computational Sciences Altix ICE 8200 Cluster Platform 3000 BL460c Cluster Platform 3000 BL460c Red Storm Cray XT3 SGI HP HP Cray Jaguar Cray Cray 101.7 Proliant XT4/XT3 C Class Blade No BGW IBM 91.3 INTEL XEON 139 Franklin AMD Opteron Cray 12 Cray XT4 151 New York Blue IBM 126.9 117.9 102.8 102.2 85.4 82.2
TATA TATA HP Cluster Platform 3000BL TATA HP BladeSystem c- Class 114 2 HP ProLiant BL460c 16 4X DDR Infiniband 4
HP HPC SVA XC SFS 5
HP HP XC HP Scalable Visualization Array (SVA) HP StorageWorks Scalable File Share (SFS) 6 Linux Linux Lustre Linux
XC Red Hat Red Hat Red Hat XC MPI-2 OS HP Itanium Xeon Opteron InfiniBand GbE 7
XC Linux IP MPI CPU Red Hat Linux * LVS * LSF SLURM * HP-MPI SystemImager * Telnet Nagios * SuperMON * Cacti/RRD * POSIX Unix ISV Linux Virtual Server Platform LSF HPC MAUI NQS Simple Linux Utility for Resource Management 1000 HP s Message Passing Interface MPICH MPI-2 SystemImager Linux 100 45 1000 2 Power control IPMI, ILO CMF Console Management Facility * Nagios SuperMon Cacti GUI RRD Round Robbin Database 8 *
XC Kickstart XC Config System Imager Nagios/Cacti/RRD Syslog Super- Mon Nagios plug-ins cmf pdsh pdcp XC mysql Red Hat HP 9
XC Red Hat Enterprise Linux Advanced Server HP RH EL AS RHEL Linux 2.6 http://www.jp.redhat.com/software/rhel/features/ CPU SMP SMP NUMA (NAPI) Ext3 NFSv4 AutoFSv4 RH EL AS XC 10
HP ServiceGuard Linux Heartbeat 11
XC Rapid Deployment NFS, NIS, NAT, NTP, Firewall, RAID SW DVD Head MAC 12
XC Rapid Deployment discover BIOS MAC cluster_config LAN IP ( ( ) startsys image_and_boot GI SW 13
Nagios Master LVS NAT Data Base 14
Nagios html http://www.nagios.org/ OS / / syslog Users/procs/uptime /disk Ping Loadave/mem/swa p/cpu/network 15
Nagios Mgmt Server System Files XC DB syslog-ng forwarding SuperMon aggregation Mgmt Hub Mgmt Hub Mgmt Hub 16
Nagios Web UI 17
Nagios Report Generator (nrg) Nagios X watch 1 analyze summary full Mgmt Hub 18
nrg mode analyze [root@xc9n5 bin]#./nrg --mode a Nodelist Description ----------------------- --------------------------------------------------------------------------------------------------------------------- xc9n2 NodeInfo - Warning] Warning thresholds have been reached for max users, processes, or zombies etc. See nagios_vars.ini for threshold values. Values are site specific and critical values indicate values have been exceeded nh [Slurm Monitor - Critical] 'sinfo' reported problems with nodes in some partitions, specifically, some nodes may be marked with an '*' which indicates they may be unresponsive to SLURM. Run 'sinfo' for more information. xc9nems1-1 xc9nems2-1 [Switch - Warning] One or more sensors on a the network switch may be reporting bad status. It may also be that one or more nodes connected via this switch have less then a 1GB link established. nh common [System Event Log - FAILEDCONNECT] The check_sel plug-in failed to connect to the console port for this node, cause is the console device cp-xxxxx, is not reachable. If this is the head node and the head node is externally connected, you may be able to define cp-xxxxx in /etc/hosts using the external IP to allow connectivity. Sensor collection may not be possible when using externally connected console ports for head nodes on platforms that use IPMI to gather sensor information. If this is not the head node then it may indicate a communication problem with the associated console device 'cp-{nodename}'. 19
LVS IP NAT SLURM LSF HP-MPI Xtools CPU MPI RedHat Linux HP 20
CPU CPU 100 1 CPU SLURM XC LSF SLURM XC First-In First-Out Faire-Share Back-Fill 21
HP MPI Intel PGI PathScale Absoft IA32 IA64 x86_64 Redhat SuSE Windows XC HPUX ( ) GbE InfiniBand (OpenIB VAPI, udapl, ITAPI) Myrinet Quadrics ISV SMP NUMA MPICH MPICH MPI-2 22
mpirun i output 23
HP MPI ISV HP MPI XC XC Vendor Application Area ABAQUS ABAQUS Finite Element Analysis Accelrys CASTEP, DMoI3, MesoDyn Material Sciences ACUSIM Software, Inc. AcuSolve Computational Fluid Dynamics ANSYS Inc. ANSYS Finite Element Analysis (DDS) AVL Excite Computer Aided Engineering CD-Adapco STAR-CD Computational Fluid Dynamics CDH AMLS Noise Vibrate Analysis Exa PowerFlow Computational Fluid Dynamics ESI PAM-CRASH Comuter Aided Engineering Fluent Inc. FLUENT Computational Fluid Dynamics LSTC LS DYNA Three-dimensional Finite Element Analysis Mecalog Radioss Finite Element Analysis MSC Software Nastran Mechanical Aided Design UGS NX Nastran Mechanical Aided Design 24 SCM ADF Computational Chemistry
Xtools CPU I/O GUI 25
1000 26
Xtools GUI 1.Interconnect (in/out) 2.CPU utilization 3.Memory utilization 4.Ethernet utilization 5.Disk (in/out) CPU Utilization Lustre Traffic Infiniband Traffic Disk Accesses GigE Traffic Number and Type of Open Sockets Interrupts and Contact Switches Memory Utilization 27
HP StorageWorks Scalable File Share(SFS) Linux Lustre Lustre POSIX I/O I/O SFS 512TB RAID5 RAID5+1 RAID6 RAID6+1 HP HP Lustre HP Linux Cluster HP SFS 28
29
30