Page 1
Power Systems Problem analysis, system parts, and locations for the IBM Power System S822LC (8335-GCA, 8335-GTA, and 8335-GTB), and IBM Power System S812LC (8348-21C)
Page 3
Power Systems Problem analysis, system parts, and locations for the IBM Power System S822LC (8335-GCA, 8335-GTA, and 8335-GTB), and IBM Power System S812LC (8348-21C)
Page 4
Note Before using this information and the product it supports, read the information in “Safety notices” on page v, “Notices” on page 145, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823. ™...
Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied the power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
Page 8
– For racks with AC power, connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet supplies proper voltage and phase rotation according to the system rating plate. – For racks with a DC power distribution panel (PDP), connect the customer’s DC power source to the PDP.
Page 9
v Each rack cabinet might have more than one power cord. – For AC powered racks, be sure to disconnect all power cords in the rack cabinet when directed to disconnect power during servicing. – For racks with a DC power distribution panel (PDP), turn off the circuit breaker that controls the power to the system unit(s), or disconnect the customer’s DC power source, when directed to disconnect power during servicing.
Page 10
CAUTION: Removing components from the upper positions in the rack cabinet improves rack stability during relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a room or building. v Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack cabinet.
Page 11
DANGER: Rack-mounted devices are not to be used as shelves or work spaces. (L002) (L003) Safety notices...
Page 12
DANGER: Multiple power cords. The product might be equipped with multiple AC power cords or multiple DC power cables. To remove all hazardous voltages, disconnect all power cords and power cables. (L003) (L007) CAUTION: A hot surface nearby. (L007) (L008) Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...
Page 13
Exchange only with the IBM-approved part. Recycle or discard the battery as instructed by local regulations. In the United States, IBM has a process for the collection of this battery. For information, call 1-800-426-4333. Have the IBM part number for the battery unit available when you call. (C003)
Page 14
Power and cabling information for NEBS (Network Equipment-Building System) GR-1089-CORE The following comments apply to the IBM servers that have been designated as conforming to NEBS (Network Equipment-Building System) GR-1089-CORE: Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...
Page 15
The equipment is suitable for installation in the following: v Network telecommunications facilities v Locations where the NEC (National Electrical Code) applies The intrabuilding ports of this equipment are suitable for connection to intrabuilding or unexposed wiring or cabling only. The intrabuilding ports of this equipment must not be metallically connected to the interfaces that connect to the OSP (outside plant) or its wiring.
Page 16
Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...
Then Yes: Continue with the next step. Go to “Resolving a BMC access problem.” 3. Can you boot the system to the Petitboot menu? Then Yes: Continue with the next step. Go to “Resolving a system firmware boot failure” on page 4. 4.
Note: If the IP address setting is incorrect, go to Configuring the firmware IP address website(http://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/ liabwenablenetwork.htm). If the MAC address is 00:00:00:00:00:00, go to “Contacting IBM service and support” on page 110. 5. Complete the following actions: a. Power on to the Petitboot menu.
Yes: Continue with the next step. Go to “Contacting IBM service and support” on page 110. This ends the procedure. 5. Perform the following actions, one at a time, until the problem is resolved: a. Ensure that the power supply is fully seated in the system.
Page 21
Then Continue with step 5. Does the baseboard management controller (BMC) respond to commands? Note: To determine whether the BMC responds to commands, run the following ipmitool command: ipmitool -I lanplus -U <username> -P <password> -H <bmc ip or bmc hostname> chassis status Then Yes: Continue with the next step.
Page 22
Then Yes: This ends the procedure. Go to “Resolving a hardware problem” on page 12. This ends the procedure. 8. Did the system complete the boot process successfully? Then Yes: Continue with the next step. Continue with step 12 on page 7. 9.
Page 23
Then Yes: Complete the service actions for the SEL events that require service actions. v If your system is an 8335-GCA or 8335-GTA, go to “Identifying a service action by using sensor and event information for the 8335-GCA and 8335-GTA” on page 37. This ends the procedure.
Then, continue with the next step. 19. Does the problem persist? Then Yes: Go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. This ends the procedure. This ends the procedure. Resolving a VGA monitor problem Learn how to identify the service action that is needed to resolve a video graphics array (VGA) monitor problem.
b. Verify that the monitor and the VGA cable are working properly by testing them on a system that is known to be working properly. If the monitor or the VGA cable does not work properly, replace c. Verify that the system is powered on by activating a serial over LAN (SOL) session through the baseboard management controller (BMC).
Page 26
5. Is the system an 8348-21C, and is the boot image on a storage device that is configured in a RAID configuration? Then Yes: Continue with the next step. Continue with step 11 on page 11. 6. On the Petitboot command line, type the following command: arcconf getconfig 1 LD Is the logical boot drive recognized and in optimal status? Then...
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the physical location and the removal and replacement procedure. v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the physical location and the removal and replacement procedure.
No: Go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. This ends the procedure. 5. Use one of the following options to display the SEL details for the sensor: Note: You must specify the SEL record ID in hexadecimal format.
Then Yes: This ends the procedure. Go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. This ends the procedure. Resolving a GPU, PCIe adapter, or device problem Learn how to access log files, information to identify types of events, and a list of potential problems and service actions.
If the RAID adapter is functioning again, review the IBM support tips to confirm that there are no PCI address, driver, or firmware conflicts. Then, reinstall the new adapters again one at a time until all adapters function properly.
Table 2. RAID adapter problems and service actions (continued). Problem Service action One or more drives are not recognized 1. If more than one drive is not recognized, verify that the cables are properly attached to the RAID card. 2. Verify that the drive or drives are fully seated in the system.
If the network adapter is functioning again, review the IBM support tips to confirm that there are no PCI address, driver, or firmware conflicts. Then, reinstall the new adapters again one at a time until all adapters function properly.
Page 33
If the graphics adapter is functioning again, review the IBM support tips to confirm that there are no PCI address, driver, or firmware conflicts. Then, reinstall the new adapters again one at a time until all adapters function properly.
Page 34
4. Does NPU chip 1 appear in the fence error log entry? v Yes: Continue with the next step. v No: Go to “Contacting IBM service and support” on page 110. This ends the procedure. 5. Replace the following items, one at a time, until the problem is resolved: Note: Go to “8335-GTB locations”...
Table 5. GPU problems and service actions for the 8335-GTB (continued) Problem Service action GPU stops working suddenly 1. If the system was recently installed, moved, serviced, or upgraded, verify that the GPU is seated properly. 2. Inspect the GPU and verify that it is not physically damaged.
2. Ensure that the latest I/O adapter firmware is installed. For instructions, see Getting firmware fixes for IBM I/O adapters by using Fix Central. 3. Ensure that you have the latest device driver service updates by installing the latest Linux distribution fixes.
Table 7. Storage device problems and service actions Problem Service action System is unable to find a storage device that is at the 1. If the system was recently installed, moved, serviced, front of the system or upgraded, verify that the device is seated and installed properly.
Use the following table to map the slot number information in the operating system log to the PCIe adapter description and service action. Table 8. Slot numbers, adapter descriptions, and service action for the 8335-GCA or 8335-GTA. Slot information from the log PCIe adapter description Service action Slot1...
Use the following table to map the slot or GPU number information in the operating system log to the GPU description and service action. This ends the procedure. Table 11. Slot numbers, GPU descriptions, and service action for the 8335-GCA or 8335-GTA Slot number information from the GPU description Service action...
Table 13. Slot numbers, adapter descriptions, and service action for the 8335-GCA Slot information from the log PCIe adapter description Service action Slot1 PCIe adapter 1 Replace the NVMe Flash adapter indicated in the PCIe adapter description column. Go to “8335-GCA and Slot3 PCIe adapter 3 8335-GTA locations”...
Then Yes: Continue with step 6. Continue with step 9. 6. To locate the device by using the identify LED, complete the following steps: a. The operating system log contains information about the device in the form sdx, where x is the letter associated with the drive that failed.
8335-GTB. Does the problem persist? Then Yes: Go to “Contacting IBM service and support” on page 110. This ends the procedure. This ends the procedure. Identifying a service action Use the following procedures to help you identify the service action that is needed.
Page 44
If this SEL event continues to be logged, go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. 01xxxxxxxxxx Go to the “EPUB_PRC_FIND_DECONFIGURE_PART isolation procedure”...
Page 45
If this SEL event continues to be logged, go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. 37xxxxxxxxxx Go to the “EPUB_PRC_EIBUS_ERROR isolation procedure”...
Page 46
Then Go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. 7. Did you find only one SEL event that requires a service action as defined in step 5 on page 29?
Page 47
v If your system is an 8335-GTB, go to “Identifying a service action by using sensor and event information for the 8335-GTB” on page 57. v If your system is an 8348-21C, go to “Identifying a service action by using sensor and event information for the 8348-21C”...
Page 48
Table 18. OEM record c0 specific log information, description, and service action for an 8335-GCA or 8335-GTA (continued) OEM record c0 specific log information Description Service action 3a1603xxxxxx Fan 3 failure Replace Fan 3. Go to “8335-GCA and 8335-GTA locations” on page 111 to identify the physical location and removal and replacement procedure.
Page 49
Table 19. OEM record c0 specific log information, description, and service action for an 8335-GTB (continued) OEM record c0 specific log information Description Service action 320exxxxxxxx OCC reset required This event is for information only. No service action is required. 3a0400xxxxxx Chassis soft power off A user initiated power off request...
Page 50
Table 19. OEM record c0 specific log information, description, and service action for an 8335-GTB (continued) OEM record c0 specific log information Description Service action 3a2604yyyyyy All of the fans are missing or failed Ensure that the fan power cable and the disk and fan signal cable are seated properly.
Page 51
Table 20. OEM record c0 specific log information, description, and service action for an 8348-21C (continued) OEM record c0 specific log information Description Service action 3a1603xxxxxx Fan 3 failure Replace Fan 3. Go to “8348-21C locations” on page 133 to identify the physical location and removal and replacement procedure.
Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a. v To display SEL details remotely over the LAN, use the following command: ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname> sel get <SEL record ID>...
You can use the sensor and event information from the system event log (SEL) to determine a service ® action to perform for the IBM Power System S822LC (8335-GCA and 8335-GTA). If you have not done so already, complete “Identifying a service action by using system event logs” on page 27.
Page 54
SEL event continues to be logged each time you power on the system, go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v S0/Go “Working”...
Page 55
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action FW Boot Progress (0x05) SEL events with OEM record c0 | v System Firmware Error 000e000 | 3a150xxxxxxx indicate that v System Firmware Hang a boot failed.
Page 56
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v CPU1 Temp (0x0B) v Lower Non-critical – going low v CPU2 Temp (0x0D) v Lower Non-critical –...
Page 57
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action If the sensor name is CPU Func 1, v CPU Func 1 (0x0C) v IERR replace CPU 1. If the sensor name is v CPU Func 2 (0x0E) v Transition to Non-recoverable CPU Func 2, replace CPU 2.
Page 58
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action All PGood (0x1C) No service action is required. v Interlock Power Down v Power Off Power Down v Power Cycle v 240VA Power Down v AC Lost...
Page 59
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v DIMM Func 1 (0x1E) v Memory Device Disabled v DIMM Func 2 (0x1F) v Uncorrectable Memory Error v DIMM Func 3 (0x20) v Memory Scrub Failed...
Page 60
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action Configuration Error Complete the following steps: v DIMM Func 1 (0x1E) 1. If the sensor name is DIMM Func v DIMM Func 2 (0x1F) 1, ensure that DIMM 1 is seated v DIMM Func 3 (0x20)
Page 61
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action Replace system processor CPU 1. Go v CPU Core Func 1 (0x3E) v IERR to “8335-GCA and 8335-GTA v CPU Core Func 2 (0x3F) v Transition to Non-recoverable locations”...
Page 62
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action Replace system processor CPU 2. Go v CPU Core Func 13 (0x4A) v IERR to “8335-GCA and 8335-GTA v CPU Core Func 14 (0x4B) v Transition to Non-recoverable locations”...
Page 63
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v Mem Buf Func 1 (0x56) v Uncorrectable Memory Error v Mem Buf Func 2 (0x57) v Memory Device Disabled v Mem Buf Func 3 (0x58) v State Deasserted...
Page 64
System Event (0x61) Undetermined system hardware Go to “Collecting diagnostic data” on failure page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v System Reconfigured v OEM System boot event v Entry added to auxiliary log...
Page 65
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v DIMM1 Temp (0x69) v Lower Non-critical – going low v DIMM2 Temp (0x6A) v Lower Non-critical –...
Page 66
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v CPU Core Temp 1 (0x89) v Lower Non-critical – going low v CPU Core Temp 2 (0x8A) v Lower Non-critical –...
Page 67
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action No service action required. v 12V Sense (0xA1) v Lower Non-critical – going low v Proc0 Power (0xA2) v Lower Non-critical –...
Page 68
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action If the sensor name is GPU Func 1 or v GPU Func 1 (0xB8) v Uncorrectable Memory Error GPU Func 2, replace GPU 1.
Page 69
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v CPU Diode 1 (0xC8) v Lower Non-critical – going low v CPU Diode 2 (0xCB) v Lower Non-critical –...
Page 70
SEL event. Otherwise, go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v Thermal Trip...
Page 71
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action Power Supply Failure Detected An assert event immediately v PSU Fault 1 (0xCD) followed by a deassert event v PSU Fault 2 (0xCE) indicates that a power cycle of the system occurred.
Page 72
Table 21. Sensor information, event description, and service action for the 8335-GCA and 8335-GTA (continued) Sensor name (Sensor ID) Event description Service action CPU VDD Volt (0xCF) No service action is required. v Lower Non-critical – going low v Lower Non-critical – going high v Lower Critical –...
You can use the sensor and event information from the system event log (SEL) to determine a service action to perform for the IBM Power System S822LC (8335-GTB). If you have not done so already, complete “Identifying a service action by using system event logs” on page 27.
Page 74
SEL event continues to be logged each time you power on the system, go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v S0/Go “Working”...
Page 75
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action FW Boot Progress (0x05) SEL events with OEM record c0 | v System Firmware Error 000e000 | 3a150xxxxxxx indicate that v System Firmware Hang a boot failed.
Page 76
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v CPU1 Temp (0x0B) v Lower Non-critical – going low v CPU2 Temp (0x0D) v Lower Non-critical –...
Page 77
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action If the sensor name is CPU Func 1, v CPU Func 1 (0x0C) v IERR replace CPU 1. If the sensor name is v CPU Func 2 (0x0E) v Transition to Non-recoverable CPU Func 2, replace CPU 2.
Page 78
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action All PGood (0x1C) No service action is required. v Interlock Power Down v Power Off Power Down v Power Cycle v 240VA Power Down v AC Lost v Ensure that ac power is supplied...
Page 79
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v DIMM Func 1 (0x1E) v Memory Device Disabled v DIMM Func 2 (0x1F) v Uncorrectable Memory Error v DIMM Func 3 (0x20) v Memory Scrub Failed...
Page 80
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action Configuration Error Complete the following steps: v DIMM Func 1 (0x1E) 1. If the sensor name is DIMM Func v DIMM Func 2 (0x1F) 1, ensure that DIMM 1 is seated v DIMM Func 3 (0x20)
Page 81
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action Replace system processor CPU 1. Go v CPU Core Func 1 (0x3E) v IERR to “8335-GTB locations” on page 121 v CPU Core Func 2 (0x3F) v Transition to Non-recoverable to identify the physical location and...
Page 82
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action Replace system processor CPU 2. Go v CPU Core Func 13 (0x4A) v IERR to “8335-GTB locations” on page 121 v CPU Core Func 14 (0x4B) v Transition to Non-recoverable to identify the physical location and...
Page 83
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v Mem Buf Func 1 (0x56) v Uncorrectable Memory Error v Mem Buf Func 2 (0x57) v Memory Device Disabled v Mem Buf Func 3 (0x58) v State Deasserted...
Page 84
System Event (0x61) Undetermined system hardware Go to “Collecting diagnostic data” on failure page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v System Reconfigured v OEM System boot event v Entry added to auxiliary log...
Page 85
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v DIMM1 Temp (0x69) v Lower Non-critical – going low v DIMM2 Temp (0x6A) v Lower Non-critical –...
Page 86
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v CPU Core Temp 1 (0x89) v Lower Non-critical – going low v CPU Core Temp 2 (0x8A) v Lower Non-critical –...
Page 87
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action No service action required. v System Power (0xA1) v Lower Non-critical – going low v Proc0 Power (0xA2) v Lower Non-critical – going high v Proc1 Power (0xA3) v Lower Critical –...
Page 88
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action If the sensor name is GPU Func 1, v GPU Func 1 (0xB8) v Uncorrectable Memory Error replace GPU 1. If the sensor name is v GPU Func 2 (0xB9) v Parity GPU Func 2, replace GPU 2.
Page 89
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v CPU Diode 1 (0xC8) v Lower Non-critical – going low v CPU Diode 2 (0xCB) v Lower Non-critical –...
Page 90
SEL event. Otherwise, go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v Thermal Trip...
Page 91
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action Power Supply Failure Detected An assert event immediately v PSU Fault 1 (0xCD) followed by a deassert event v PSU Fault 2 (0xCE) indicates that a power cycle of the system occurred.
Page 92
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action CPU VDD Curr (0xD0) No service action is required. v Lower Non-critical – going low v Lower Non-critical – going high v Lower Critical –...
Page 93
Table 22. Sensor information, event description, and service action for the 8335-GTB (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v Fan 1 (0xD4) v Lower Non-critical – going low v Fan 2 (0xD5) v Lower Non-critical –...
You can use the sensor and event information from the system event log to determine a service action to perform for the IBM Power System S812LC (8348-21C). If you have not done so already, complete “Identifying a service action by using system event logs” on page 27.
Page 95
SEL event continues to be logged each time you power on the system, go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v S0/Go “Working”...
Page 96
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action FW Boot Progress (0x05) SEL events with OEM record c0 | v System Firmware Error 000e000 | 3a150xxxxxxx indicate that v System Firmware Hang a boot failed.
Page 97
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action CPU Temp (0x64) No service action is required. v Lower Non-critical – going low v Lower Non-critical – going high v Lower Critical - going low v Lower Critical –...
Page 98
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action CPU Func (0x4E) Replace the system processor. Go to v IERR “8348-21C locations” on page 133 to v Transition to Non-recoverable identify the physical location and v Predictive Failure removal and replacement procedure.
Page 99
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action All PGood (0x1C) No service action is required. v Interlock Power Down v Power Off Power Down v Power Cycle v 240VA Power Down v AC Lost v Ensure that ac power is supplied...
Page 100
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v DIMM Func 0 (0x1E) v Memory Device Disabled v DIMM Func 1 (0x1F) v Uncorrectable Memory Error v DIMM Func 2 (0x20) v Memory Scrub Failed...
Page 101
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action Configuration Error Complete the following steps: v DIMM Func 0 (0x1E) 1. If the sensor name is DIMM Func v DIMM Func 1 (0x1F) 0, ensure that DIMM 0 is seated v DIMM Func 2 (0x20)
Page 102
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action Replace the system processor. Go to v CPU Core Func 1 (0x3E) v IERR “8348-21C locations” on page 133 to v CPU Core Func 2 (0x3F) v Transition to Non-recoverable identify the physical location and...
Page 103
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v Membuf Func 0 (0x4A) v Uncorrectable Memory Error v Membuf Func 1 (0x4B) v Memory Device Disabled v Membuf Func 2 (0x4C) v State Deasserted...
Page 104
System Event (0x52) Undetermined system hardware Go to “Collecting diagnostic data” on failure page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v System Reconfigured v OEM System boot event v Entry added to auxiliary log...
Page 105
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v DIMM Temp 0 (0x69) v Lower Non-critical – going low v DIMM Temp 1 (0x6A) v Lower Non-critical –...
Page 106
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v CPU Core Temp 1 (0x89) v Lower Non-critical – going low v CPU Core Temp 2 (0x8A) v Lower Non-critical –...
Page 107
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action OS Boot (0x5A) Ensure that the operating system v Installation aborted boot image is loaded. Ensure that the v Installation failed disk drive or solid-state drive is ready.
Page 108
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action CPU Diode Sensor (0x0B) No service action is required. v Lower Non-critical – going low v Lower Non-critical – going high v Lower Critical –...
Page 109
SEL event. Otherwise, go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and support” on page 110. No service action is required. v Thermal Trip...
Page 110
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action Power Supply Failure Detected An assert event immediately v PSU Fault 1 (0x5D) followed by a deassert event v PSU Fault 2 (0x5E) indicates that a power cycle of the system occurred.
Page 111
Table 23. Sensor information, event description, and service action for the 8348-21C (continued) Sensor name (Sensor ID) Event description Service action No service action is required. v Fan 1 (0xB3) v Lower Non-critical – going low v Fan 2 (0xB4) v Lower Non-critical –...
Correctable Machine Check Error or Transition to Non-recoverable in the description? Then Yes: Continue with the next step. Go to “Contacting IBM service and support” on page 110. This ends the procedure. 3. Is your system an 8335-GCA, 8335-GTA, or 8335-GTB? Then Yes: Continue with the next step.
Then Yes: Replace the system backplane. If the replacement of the system backplane does not resolve the problem, go to “Contacting IBM service and support” on page 110. This ends the procedure. This ends the procedure. 5. The system is an 8348-21C. For each of the SELs that you identified in step 2 on page 96, are any of the sensor names CPU Func or CPU Core Func x, where x is 1 - 12? Note: Go to “8348-21C locations”...
If the replacement of the system processors and the system backplane does not resolve the problem, go to “Contacting IBM service and support” on page 110. This ends the procedure. 8348-21C Replace the following items, one at a time, in the order that is shown until the problem is resolved: 1.
Then Yes: Replace the system backplane. If the replacement of the system backplane does not resolve the problem, go to “Contacting IBM service and support” on page 110. This ends the procedure. This ends the procedure. 5. The system is an 8348-21C. For each of the SELs that you identified in step 2 on page 98, determine the sensor name that is associated with each SEL.
If the replacement of the system processors and the system backplane does not resolve the problem, go to “Contacting IBM service and support” on page 110. This ends the procedure. 8348-21C Replace the following items, one at a time, in the order that is shown until the problem is resolved: 1.
If replacing the system backplane and both system processors does not resolve the problem, go to “Contacting IBM service and support” on page 110. This ends the procedure. 8335-GTB Replace the system backplane. If replacing the system backplane does not resolve the problem, replace system processor CPU 1.
Yes: Continue with the next step. Go to “Contacting IBM service and support” on page 110. This ends the procedure. 3. Is your system an 8335-GCA, 8335-GTA, or 8335-GTB? Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...
Then Yes: Replace the system backplane. If the replacement of the system backplane does not resolve the problem, go to “Contacting IBM service and support” on page 110. This ends the procedure. This ends the procedure. 5. The system is an 8348-21C. For each of the SELs that you identified in step 2 on page 102, are any of the sensor names CPU Func or CPU Core Func x, where x is 1 - 12? Note: Go to “8348-21C locations”...
EPUB_PRC_MEMORY_UE isolation procedure An uncorrectable memory problem occurred. 1. Look for system event logs that are related to memory and occurred around the same time as the problem that you are working on. Go to “Identifying a service action by using system event logs” on page 27.
Page 121
Then Yes: Replace the system backplane. If the replacement of the system backplane does not resolve the problem, go to “Contacting IBM service and support” on page 110. This ends the procedure. This ends the procedure. 6. The system is an 8348-21C. For each of the SELs that you identified in step 3 on page 104, are any of the sensor names CPU Func or CPU Core Func x, where x is 1 - 12? Note: Go to “8348-21C locations”...
If replacing the system backplane and both system processors does not resolve the problem, go to “Contacting IBM service and support” on page 110. This ends the procedure. 8335-GTB Replace the system backplane. If replacing the system backplane does not resolve the problem, replace system processor CPU 1.
Yes: Continue with the next step. Go to “Contacting IBM service and support” on page 110. This ends the procedure. 2. Use the ipmitool command to examine system event logs (SELs). v To list SELs by using an in-band network, use the following command:...
Yes: Continue with the next step. Go to “Contacting IBM service and support” on page 110. This ends the procedure. 7. Replace the following items one at a time until the problem is resolved: Note: Go to “8335-GTB locations” on page 121 to identify the physical location and the removal and replacement procedure.
Collecting diagnostic data Learn how to collect diagnostic data to send to IBM service and support. To collect diagnostic data, complete the following steps: Is the operating system available? Then Yes: Continue with step 2.
Follow the instructions to install and run the system event log collection tool. Then, continue with the next step. 4. Send the data that you collected during this procedure to IBM service and support. This ends the procedure. Contacting IBM service and support You can contact IBM service and support by telephone or through the IBM Support Portal.
Page 128
Figure 2. Top view Table 33. Top view locations FRU removal and replacement Index number FRU description procedures Power switch and power switch cable See Removing and replacing the power switch and cable for 8335-GCA or 8335-GTA. Disk drive and fan card See Removing and replacing the disk drive and fan card in the 8335-GCA or 8335-GTA.
Page 129
Table 33. Top view locations (continued) FRU removal and replacement Index number FRU description procedures Disk and fan signal cable See Removing and replacing the disk and fan signal cable in the 8335-GCA or 8335-GTA. Fan power cable See Removing and replacing the fan power cable in the 8335-GCA or 8335-GTA.
Page 130
Table 34. Rear view locations (continued) FRU removal and replacement Index number FRU description procedures PCIe adapter 1 See Removing and replacing a PCIe adapter on the system backplane of PCIe adapter 3 the 8335-GCA or 8335-GTA. PCIe adapter 4 Memory locations The following diagram shows memory riser cards and their corresponding field-replaceable unit (FRU) layouts in the system.
After you identify the part number of the part that you want to order, go to Advanced Part Exchange Warranty Service. Registration is required. If you are not able to identify the part number, go to Contacting IBM service and support. Finding parts and locations...
Page 132
Rack final assembly Figure 5. Rack final assembly Table 36. Rack final assembly part numbers Units per Index number Part number assembly Description 45W8836 Fixed rail kit - contains left and right fixed rails and attaching screws 74Y9063 Cable management arm assembly 45W8836 Fixed rail kit - contains left and right fixed rails and attaching screws...
Page 133
System parts Figure 6. System parts Finding parts and locations...
Page 134
Table 37. System parts Units per Index number Part number assembly Description Top access cover assembly 00E4485 Graphics processing unit (GPU) riser with GPU 00E4484 GPU riser without GPU Note: Requires two riser fillers. See index 23 in Table 38 on page 119 for the riser filler part number.
Page 135
Additional system parts Figure 7. Additional system parts Table 38. Additional system parts. Index number Part number Units per assembly Description 00E4472 Disk drive and fan card 00E4476 Screw kit Note: The screw kit includes 12 screws for the disk drive and fan card and 16 screws for the system backplane.
Page 136
Table 38. Additional system parts (continued). Index number Part number Units per assembly Description 00E4255 Graphics processing unit (GPU) shield 00E4514 Riser fillers for the GPU riser or the PCIe riser 00E4470 System backplane 00E4476 Screw kit Note: The screw kit includes 12 screws for the disk drive and fan card and 16 screws for the system backplane.
Page 138
Figure 9. Top view Table 40. Top view locations FRU removal and replacement Index number FRU description procedures Power switch and cable See Removing and replacing the power switch and cable in the 8335-GTB. Front USB cable and connector See Removing and replacing the front USB cable and connector in the 8335-GTB.
Page 139
Table 40. Top view locations (continued) FRU removal and replacement Index number FRU description procedures Fan power cable See Removing and replacing the fan power cable in the 8335-GTB. CPU 1 See Removing and replacing a system processor module for the 8335-GTB. CPU 2 Time-of-day battery See Removing and replacing the...
Page 140
Figure 11. Memory locations on memory riser cards The following table provides the memory locations on the memory riser cards. Table 42. Memory locations on memory riser cards FRU removal and Index number Memory riser card FRU description replacement procedures Memory riser 1 DIMM 1 See Removing and...
After you identify the part number of the part that you want to order, go to Advanced Part Exchange Warranty Service. Registration is required. If you are not able to identify the part number, go to Contacting IBM service and support. Rack final assembly Figure 12.
Page 142
Table 43. Rack final assembly part numbers Units per Index number Part number assembly Description 45W8836 Fixed rail kit - contains left and right fixed rails and attaching screws 00E4260 Slide rail kit - contains left and right slide rails and attaching screws 74Y9063 Cable management arm assembly...
Page 143
System parts (air-cooled and water-cooled systems) Figure 13. System parts (air-cooled and water-cooled systems) Table 44. System parts (air-cooled and water-cooled systems) Units per Index number Part number assembly Description Top access cover assembly 2 - 3 PCI adapters. Use the feature type of the adapter to find the FRU number in PCIe adapters for the 8335-GTB.
Page 144
Table 44. System parts (air-cooled and water-cooled systems) (continued) Units per Index number Part number assembly Description 00E4704 Power riser with time-of-day battery slot Note: The power riser part number does not include the time-of-day battery. The time-of-day battery is a CR2450N lithium battery.
Page 145
Additional system parts (air-cooled system) Figure 14. Additional system parts (air-cooled system) Table 45. Additional system parts (air-cooled system) Index number Part number Units per assembly Description Graphics processing unit (GPU) air baffles 01EM024 Rear GPU kit (includes GPU card, air baffle, heat sink, and thermal interface material (TIM)) 01EM025 Front GPU kit (includes GPU card, air baffle, heat sink,...
Page 146
Table 45. Additional system parts (air-cooled system) (continued) Index number Part number Units per assembly Description 00E5185 8 core 3.259 GHz system processor module kit (includes system processor module, processor tray, 4mm hex driver, module replacement tool, and air pump) 00E5187 10 core 2.860 GHz system processor module kit (includes system processor module, processor tray, 4mm hex driver,...
Page 147
Additional system parts (water-cooled system) Figure 15. Additional system parts (water-cooled system) Table 46. Additional system parts (water-cooled system) Index number Part number Units per assembly Description 00E4570 System backplane kit (includes module removal tool, 4 mm hex key, magnetic screwdriver, air pump, and lid removal tool) Note: When replacing the system backplane kit in a water-cooled 8335-GTB, you also need the System...
Table 46. Additional system parts (water-cooled system) (continued) Index number Part number Units per assembly Description 00E5185 8 core 3.259 GHz system processor module kit (includes system processor module, processor tray, 4mm hex driver, module replacement tool, and air pump) 00E5187 10 core 2.860 GHz system processor module kit (includes system processor module, processor tray, 4mm hex driver,...
Page 150
Table 48. Front view locations (continued) FRU removal and replacement Index number FRU description procedures Power switch and cable See Removing and replacing the power switch and cable in the 8348-21C. Figure 17. Top view Table 49. Top view locations FRU removal and replacement Index number FRU description...
Page 151
Table 49. Top view locations (continued) FRU removal and replacement Index number FRU description procedures Time-of-day battery See Removing and replacing the time-of-day battery in the 8348-21C. PCIe adapter 1 See Removing and replacing a PCIe adapter in the 8348-21C. PCIe adapter 2 PCIe adapter 3 PCIe adapter 4...
Page 152
Figure 19. Rear drive tray top view Table 51. Rear drive tray top view locations FRU removal and replacement Index number FRU description procedures HDD 12 See Removing and replacing a rear drive in the 8348-21C. HDD 13 Memory locations The following diagram shows memory DIMMs and their corresponding field-replaceable unit (FRU) layouts in the system.
Page 153
Figure 20. Memory locations on the system backplane The following table provides the memory locations on the system backplane. Finding parts and locations...
After you identify the part number of the part that you want to order, go to Advanced Part Exchange Warranty Service. Registration is required. If you are not able to identify the part number, go to Contacting IBM service and support. Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...
Page 155
Rack final assembly Figure 21. Rack final assembly Table 53. Rack final assembly part numbers. Units per Index number Part number assembly Description 01AF405 Slide rail kit - contains left and right slide rails and attaching screws Finding parts and locations...
Page 156
System parts Figure 22. System parts Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...
Page 157
Table 54. System parts. Units per Index number Part number assembly Description Top access cover assembly 01AF251 Power distribution board, cable, and power supply control cable 01AF243 01AF252 Power switch card and cable Power switch bezel 01AF246 Front drive carriers 00LY397 960 GB solid-state drive 00LY423...
Page 158
Additional system parts Figure 23. Additional system parts Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...
Page 159
Table 55. Additional system parts. Index number Part number Units per assembly Description 78P4489 4 GB, 1600 MHz DDR3 DIMM 78P4490 8 GB, 1600 MHz DDR3 DIMM 78P4491 16 GB, 1600 MHz DDR3 DIMM 78P4492 32 GB, 1600 MHz DDR3 DIMM Note: The DIMM FRU might include a heat spreader.
Page 160
Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...
Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
All IBM prices shown are IBM's suggested retail prices, are current and are subject to change without notice. Dealer prices may vary. This information is for planning purposes only. The information herein is subject to change before the products described become available.
This product uses standard navigation keys. Interface information The IBM Power Systems servers user interfaces do not have content that flashes 2 - 55 times per second. The IBM Power Systems servers web user interface relies on cascading style sheets to render content properly and to provide a usable experience.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
Page 165
IBM-Allee 1, 71139 Ehningen, Germany Tel: +49 800 225 5426 email: halloibm@de.ibm.com Warning: This is a Class A product. In a domestic environment, this product may cause radio interference, in which case the user may be required to take adequate measures.
Page 166
This statement explains the JEITA statement for products greater than 20 A per phase, three-phase. Electromagnetic Interference (EMI) Statement - People's Republic of China Declaration: This is a Class A product. In a domestic environment this product may cause radio interference in which case the user may need to perform practical action.
Page 167
Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung von IBM verändert bzw.
Properly shielded and grounded cables and connectors must be used in order to meet FCC emission limits. Proper cables and connectors are available from IBM-authorized dealers. IBM is not responsible for any radio or television interference caused by unauthorized changes or modifications to this equipment.
Page 169
This product is in conformity with the protection requirements of EU Council Directive 2014/30/EU on the approximation of the laws of the Member States relating to electromagnetic compatibility. IBM cannot accept responsibility for any failure to satisfy the protection requirements resulting from a non-recommended modification of the product, including the fitting of non-IBM option cards.
Page 170
Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung von IBM verändert bzw.
Permissions for the use of these publications are granted subject to the following terms and conditions. Applicability: These terms and conditions are in addition to any terms of use for the IBM website. Personal Use: You may reproduce these publications for your personal, noncommercial use provided that all proprietary notices are preserved.
Page 172
Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C...