4.1 Introduction
Each successive generation of IBM servers is designed to be more reliable than the previous
server family. POWER7 processor-based servers have new features to support new levels of
virtualization, ease administrative burden, and increase system use.
Reliability starts with components, devices, and subsystems designed to be fault-tolerant.
POWER7 uses lower voltage technology, improving reliability with stacked latches to reduce
soft error (SER) susceptibility. During the design and development process, subsystems go
through rigorous verification and integration testing processes. During system manufacturing,
systems go through a thorough testing process to ensure high product quality levels.
The processor and memory subsystem contain a number of features designed to avoid or
correct environmentally induced, single-bit, intermittent failures as well as handle solid faults
in components. This includes selective redundancy to tolerate certain faults without requiring
an outage or parts replacement.
The PS703 and PS704 blades are used with a BladeCenter chassis and the various
components that make up the BladeCenter infrastructure. In general, the BladeCenter
infrastructure RAS is outside the scope of this chapter. However, when appropriate, the
BladeCenter features that enable, complement, or enhance RAS functionality on the PS703
and PS704 blades are discussed.
IBM is the only vendor that designs, manufactures, and integrates its most critical server
components:
POWER processors
Caches
Memory buffers
Hub-controllers
Clock cards
Service processors
Design and manufacturing verification and integration, along with field support feedback,
informs and motivates continued improvement on the final products.
This chapter includes a manageability section describing the means to successfully manage
your systems.
Several software-based availability features exist that are based on the benefits available
when using AIX and IBM i as the operating system. Support of these features when using
Linux varies.
4.2 Reliability
Highly reliable systems are built with highly reliable components. On IBM POWER
processor-based systems, this basic principle is expanded upon with a clear design for
reliability architecture and methodology. A concentrated, systematic, architecture-based
approach is designed to improve overall system reliability with each successive generation of
system offerings.
120
IBM BladeCenter PS703 and PS704 Technical Overview and Introduction