On the evening of 6th January, one of the disks in the IBM storage array used to provide storage for KVM guests at the Forum decided to go faulty. As with other storage arrays, this should not have been a major issue. One of the hot spares should have been brought on-line and the array rebuilt using that drive. Unfortunately, the drive failing locked up the whole array such that it stopped responding on the SAN (block requests) and LAN (management interface). Fortunately we have the ability to power-cycle most server equipment from home, so the array was duly power-cycled and brought back into service, with one of the hot spares kicking into life and the array being rebuilt. A small number of KVM guest servers didn’t automatically recover from this so were rebooted first thing on the 7th January.
IBM asked for some logs to be sent to them for analysis, but apparently this didn’t show up the reason for the array lockup.