Hi
Our 4 node cluster had networking issues last night which resulted in isolated hosts and VMs getting shut down. The hosts are running ESX 4.1 and have the default HA cluster settings defined (medium restart priority and VM shut down). DRS is set to manual and we lots of DRS rules keeping clustered VMs apart etc, nothing major.
I came in this morning to find that all the VMs in the estate (about 200) were all powered off including vCentre.
Only after we managed to locate and power on the VC database and virtual centre did the VMs begin powering back on automatically and moving hosts (all done by vpxuser). The network issue appears to have been sorted during the night (a possibly switch reboot or something, still trying to find out from the comms team) so why have the VMs attempted to be restarted now the hosts are not complaining about being isolated?
I was under the impression that HA works *without* vCentre being available (i.e. VMs will restart on other hosts) but this doesnt appear to have been the case here.
There seems to be a fair few events of "Virtual machine was restarted on host-esx03.fqdn.int since host-esx01.fqdn.int failed" followed immediately with "Failover unsuccessful for this virtual machine" even though the VM *was* moved from esx01 to esx03 but not powered on
Seems very odd...
Can anyone shed any light on this?
thanks