Quantcast
Channel: VMware Communities : Discussion List - Availability: HA & FT
Viewing all 845 articles
Browse latest View live

RHEL OS Cluster + Oracle + VMware 5.0

$
0
0

HI Team ,

 

I need to set RHEL OS cluster witn integration of oracle on 2 VMs setup on Vmware 5.0.

On Vmware website , how to do MSCS is available but not able to find about RHEL cluster.

 

Can you pls help & suggest how to achieve it ?


Alarm Insufficient Failover Resources - Troubleshooting

$
0
0

Hello,

 

We have a cluster of 6 Dell R610s running ESXI5. Twice now I've come in to see an email from vCentre saying: "Insufficient resources to satisfy vSphere HA failover level"

 

CPU usage is very low on all hosts, memory use is 60-70% apart from one host that is 80%. Could it be memory use is spiking to cause this message from time to time?

 

Where could I look to get more details on what caused this. The alert is very little help. I know the time from the event log - are there any particular logs on the boxes that might help?

 

Thanks

Nick

vSphere HA not functioning as expected

$
0
0

Hi all, it's been a while since I have been here and I am hoping someone has some insight into my problem.

 

I have a 3 host cluster:

  • Ent. Plus
  • ESXi 5 821926 - fully patched and up to date
  • Firmware fullt patched and up to date
  • HP BL465 G5 - installed from latest HP VIB

 

HA Cluster:

  • HA Enabled
  • 2 additional das.isolationpingaddress set
  • VMs in cluster set to "Power Off"
  • Datastore Heartbeat set for "Select any of the cluster datastores taking into account my preferences"
    • I don't seem to be able to manually specify datastores - if I do, save and then return, no datastores are selected. This happens on all three choices, above and  "Select Any" or "Select Preferred"

 

Here's what happens: one of the ESXi hosts goes offline and shows disconnected in vCenter. All of the VM's show disconnected and are non-functional. The log files (vmkernel.log) shows one of the datastores as disconnected. I can usually connect to SSH or shell, but not always.HA has not restarted any of the VMs on the other hosts - until I manually power off the offending host. As soon as I manually power off the host, VMs register on other hosts and become available, and then when I power on the host, all is good (until it happens again) There is no one offending host, it happens on all three.

 

My storage vendor is working on the disconnect issue. I hope to solve the cause sooner rather than later. What I am bothered by is; why isn't HA restarting VMs? I thought that with datastore heartbeat, this wouldn't be an issue. I have set up way too many FT VMs to react to this issue, including demoting several prod VMs to single cpu, just to react to the situation. I would like to limit my FT exposure to 4 or so VMs (primarily DCs and such) and count on HA, but right now neither I, nor official support has an idea. Do you??

 

Oh yeah, here's a puzzler - my Management Network is on vmk3. vMotion is on vmk0, iSCSI01 is vmk1 and iscsi02 is vmk2. Port binding is done, but these hosts were originally ESX 4.1 (with Service Console) I upgraded to 5.X, then I ran a backup with vicfg-cfgbackup and installed with HP VIB ISO, and restored with vicfg-cfgbackup. It seems that the original installer didn't actially use the HP VIB, so I wanted to re-install with the correct VIB. And I included the HP download source in VUM too! 100% compliant. What the heck is going on. I can post log files, etc. This is actually a DEV environment.

 

Much gratitude.

 

-John

iscsi shared storage is entirely not be seen at all in heartbeat datastore list, even though a host is still connected to it.

$
0
0

Hi All,

 

            I have a setup in which i was using 2 shared storage.1 vmfs datastore and other ISCSI. I have 3 hosts (5.0)connected to it. one by one started unmounting the iscsi datastore from each host.

–>when i unmounted 1st host with iscsi the “datastore heart beat” dialogue box in cluster status showed number of host connected to heart beat datastores as:
->3 hosts connected to vmfs datastore and
->2 hosts connected to iscsi datastore.(showing all the hosts connecting to the datastore)

–>when i unmounted 2nd host from iscsi datastore ,the datastore heart beat” dialogue box in cluster status showed number of host connected to heart beat datastores as:
->3 hosts connected to vmfs datastore and
->The iscsi datastore is not be seen on the heart beat datastore list at all.(but host 3 is still connected with iscsi datastore . so as per my understanding the heartbeat datastore dialogue box shld be showing totally 2 datastores and in that the number of hosts connected to iscsi datastore as 1.)

can you please clarify why iscsi is entirely not be seen at all in heartbeat datastore even though host 3 is still connected to it?

vSphere 5.0 - Streched Storage Cluster - HA behavior?

$
0
0

Hello together

 

 

//Edit: I cut some lines to make the question a bit more general.

 

 

In this case I’m focusing on the scenario where the two datacenters get partitioned.

 

In case the LAN & FC link will fail simultaneously, (DataCore) won’t be able to deny access to none of the sites, so both sites will have a functional storage.

 

How will vSphere HA react to it?

 

For now I will assume:

-          vSphere 5.0

-          HA master in datacenter A

-          Only datacenter A has an active gateway

-          vCenter server resides in datacenter B

-          Yes this is not a supported vMSC solution

 

datacenter A

datacenter B

  • The master in datacenter A will recognize that the hosts/slaves from datacenter B stopped sending heartbeats.

 

  • Datastore heartbeats from all hosts in datacenter B will expire

 

  • The HA master still receives network heartbeats from its slaves in datacenter A

 

  • The master in datacenter A will declare all hosts in datacenter B as dead

 

  • The master will restart the protected virtual machines from datacenter B
  • The hosts in datacenter B will recognize that the master from datacenter A stopped sending heartbeats.

 

  • The slaves will elect a new master

 

  • Because all hosts in datacenter B receive election traffic they won’t   trigger an isolation response

 

  • The new master will check the poweron file and the protectedlist (or he will communicates with the vCenter server) to get the necessary information about   the virtual machines

 

  • The master will restart the protected virtual machines from datacenter A

 

 

Is this even possible or do I miss something and I'm completely wrong?

In case the vCenter server runs in datacenter A, the new elected master at least could use the protectedlist and the poweron file, couldn’t he?

 

With vSphere 4.1 this should be also possible in case there is at least one primary node in each datacenter?

 

At the moment I don’t see any solution to prevent this from happening (without losing the flexibility of a vSphere cluster) but not using a not supported stretched storage “cluster”?

 

Regards

Patrick

Misc.HeartbeatPanicTimeout default value

$
0
0

Does anyone know what the default value of the advanced setting named "Misc.HeartbeatPanicTimeout" is?  This blog post says the default is 60.  But I just built three new ESXi 5.0 U1 hosts and they are all set to 14.  But on another cluster I have, some are 60, some are 14.  This isn't a value that we set manually, at least not directly.  I only noticed it because one of our clusters is showing two hosts as being out of compliance from the host profile as this value is different amongst hosts in the cluster.  Then on other clusters, the host profile doesn't even include this value, so of course there's no profile compliance issue, but I really don't understand why it is diffferent from host to host.

vSphere HA detected that this host is in a different network than the master to which vCenter Server is connected

$
0
0

Hi all

 

I try to install a vSphere 5.1 environment (vCenter 5.1 (799731)).. it was never that hard to get a VMware product running.

 


Problem: 

 

As soon as i create a HA Cluster (with 2 ESXi 5.1 (799733)) i receive the message:

     "vSphere HA detected that this host is in a different network than the master to which vCenter Server is connected"

 

I have tried everything..

 

     place vCenter in other Networks

     Multihome vCenter

     place vCenter in the same network like the ESX hosts

     doublechecking ip's, subnets, gw's, name resolution, vSwitches

 

nothing helps.. is it a bug?

 

vCenter and also the two ESXi Hosts resides now in the same L2 network.

 

ESX1:     10.254.250.74     255.255.255.0

ESX2:     10.254.250.77     255.255.255.0

VC:          10.254.250.29     255.255.255.0

 

The message can't be correct.

 

Please.. @ anybody out there... HELP! I'm at the end of my knowledge.

 

 

Regards

 

Adrian

Essentials Plus license --- HA and FT

$
0
0

Hi there,

-Want to use the FT feature so that my VMs are transferred to another host while power on.  At the moment the FT feature is greyed out.  Does my essentials plus does not include it?

- I am using HA and its working fine, the only problem is it power off (not properly) the VMs and then transfers to another host.

-When I want to enable DRS it also asks me for the license?

 

To my understanding before i bought this essentials plus was:  HA includes all necessary features to keep the VMs up and running seamlessly (without power off).  Is there any software I am missing within my license which would give me DRS and the FT features?  What are my options here?

Help please.


what happens to VM in HA-DRS cluster

$
0
0

hi all,

 

can someone please clarify

 

by HA / DRS i understand is

if i have HA only ESX cluster -  VM will restart on next available host.

if i have DRS only ESX cluster - VM will vmotion from highly loaded ESX to less loaded ESX ( ---- by loaded - meaning more CPU / memory)

 

Question:what happens to VM on HA-DRS cluster.??

 

Scenario;VM is part of ESX which has HA-DRS properties enabled.

I reboot one of the ESX . Will the VM automatically vmotion to next ESX? Will it RESTART also on next available ESX

 

Q is will it restart or not restart ?

 

-hope i am clear (not confusing you)

 

-sri

Not able to set vpxd.das.electionWaitTimeSec parameter in advanced configuartion in sphere ha advanced options.

$
0
0

Hi All,

          I need this help asap since i am struck because of this. i am using vcenter server 5.1 and want to set  vpxd.das.electionWaitTimeSec =240,since ha configuration on hosts wehn added to cluster getting failed because of time out i wanted to set this value. but when i try to set it am getting error "a specified parameter was not correct.bad HA advanced option keys:config.vpxd.das.electionWaitTimeSec:bad value"

 

I tried all vmware docs but no use.

 

Thanks

sp

HA agent in cluster has an error

$
0
0

ESX 3.5 - 2 node cluster managed by VCenter 2.5

 

Working fine up until last week when I saw the error appear on the second node "HA agent on xxx in cluster yyy in zzz has an error".

 

I clicked "Reconfigure for VMware HA" and that worked for about 30 seconds then errored again.

 

The detailed events for that host in VCenter say sufficient resources when enabling then change to Insufficient resources to satisfy HA failover level on cluster.

 

We haven't added any new machines on either host nor has the configuration changed. Resource Distribution is 0-10% for CPU and 20-30% RAM on one host and 30-40% RAM on second host.

 

I checked the vmware_hostname.log file on the problematic host and the only thing that seems wrong is

 

Error FT Mon Nov  5 14:22:14 2012
By: FullTime/Process Monitor on Node: hostname
MESSAGE: Invalid Failure Detection IP Address 10.99.10.152, please fix.

 

followed by

 

Warning SEC Mon Nov  5 14:22:14 2012
By: FT/Agent on Node: msvottsanhost1
MESSAGE: Rejected Message. msgid 98 from (1/3:24716.0)

 

Then it continues with "Node is running" and both hosts are receiving heartbeats from each other.

 

We've tried disabling HA and re-enabling but that didn't work.

 

Under DNS and Routing, both hosts match domain, preferred/alternate DNS, search domains, default gateways for service console and VMKernel (all lowercase too).

 

Running out of things to check/try! Any direction would be appreciated.

HA issues

$
0
0

i upgraded my vcenter to 5.1 a couple days ago. since the upgrade my vShpere is telling me that HA initiated a virtual machine failover in the datacenter.

 

i have checked the HA configuration and it is all looking good with no problems. all the VM's are running fine and none of them have any errors or warnings. is this something that will be corrected once i upgrade my hosts to 5.1? or is there another underlying problem that might be there that i should be looking for before i upgrade the hosts to 5.1?

vim.fault.NoCompatibleHost N3Vim5Fault16NoCompatibleHostE

$
0
0

Had a full poweroutage at a customer datacenter.  Power was restored but HA did not start up any VMs automatically after the hosts came up.

I don't understand why HA failed, and a googling for some of the repeated errors returned troubling little.

I took at look at the fdm.log and I see that it continuously failed to find suitable 'hosts' for the VMs for over an hour.. the VMs were eventually manually powered on when the system admin arrived at the DC over an hour after the environment powered up.

Here's a snippet of where it decides there are 65 VMs that need to be powered up but then immediately starts to fault them..

2012-11-03T15:12:15.573Z [FFD85B90 verbose 'Placement' opID=SWI-72429e8f] [DrmPE::GenerateFailoverRecommendation] 65 vms added to domain config
2012-11-03T15:12:15.573Z [FFD85B90 verbose 'Placement' opID=SWI-72429e8f] [DrmPE::InvokeDrsMultiplePasses] Pass2: respect host preference but not failover hosts
2012-11-03T15:12:15.573Z [FFD85B90 verbose 'Placement' opID=SWI-72429e8f] [DrmPE::InvokeDrsAlgorithmForPlacement] Calling mapVm to place 65 Vms

And then immediately it faults all 65 with the same error..

2012-11-03T15:12:15.576Z [FFD85B90 verbose 'drmLogger' opID=SWI-72429e8f] DrmFault: reason powerOnVm, vm /vmfs/volumes/4ac2492d-8aa9fb29-d449-001517a6c248/SOMEVM/SOMEVM.vmx, host host-231, fault [N3Vim5Fault16NoCompatibleHostE:0x5b08b88]
2012-11-03T15:12:15.576Z [FFD85B90 verbose 'drmLogger' opID=SWI-72429e8f] FaultArgument: none

all the way through all 65..

Then it tries another pass

2012-11-03T15:12:15.581Z [FFD85B90 verbose 'Placement' opID=SWI-72429e8f] [DrmPE::InvokeDrsMultiplePasses] Pass3: use all compatible hosts
2012-11-03T15:12:15.581Z [FFD85B90 verbose 'Placement' opID=SWI-72429e8f] [DrmPE::InvokeDrsAlgorithmForPlacement] Calling mapVm to place 65 Vms
2012-11-03T15:12:15.584Z [FFD85B90 verbose 'drmLogger' opID=SWI-72429e8f] DrmFault: reason powerOnVm, vm /vmfs/volumes/4ac2492d-8aa9fb29-d449-001517a6c248/SOMEVM/SOMEVM.vmx, host host-231, fault [N3Vim5Fault16NoCompatibleHostE:0x5b00348]
2012-11-03T15:12:15.584Z [FFD85B90 verbose 'drmLogger' opID=SWI-72429e8f] FaultArgument: none

After that pass it then changes to the vim.fault.NoCompatibleHost error

2012-11-03T15:12:15.587Z [FFD85B90 verbose 'Placement' opID=SWI-72429e8f] [PlacementManagerImpl::PlacementUpdateCb] No recommendation is generated
2012-11-03T15:12:15.587Z [FFD85B90 verbose 'Placement' opID=SWI-72429e8f] [PlacementManagerImpl::HandleNotPlacedVms] Reset Vm /vmfs/volumes/4ac2492d-8aa9fb29-d449-001517a6c248/SOMEVM/SOMEVM.vmx, vim.fault.NoCompatibleHost

 

Then it just outputs the following for almost 2 hours..

2012-11-03T15:12:37.824Z [FFE89B90 verbose 'Placement'] [RR::ResetVms] Reset 0 Vms. Records = 65
2012-11-03T15:12:37.824Z [FFE89B90 info 'Placement'] [RR::CreatePlacementRequest] 65 total VM with some excluded: 0 VM disabled; 0 VM being placed; 65 VM waiting resources; 0 VM in time delay;

....

2012-11-03T16:57:37.931Z [70B13B90 info 'Placement'] [RR::CreatePlacementRequest] 65 total VM with some excluded: 0 VM disabled; 0 VM being placed; 65 VM waiting resources; 0 VM in time delay;
2012-11-03T16:58:37.933Z [FFB9C460 verbose 'Placement'] [RR::ResetVms] Reset 0 Vms. Records = 65

 

At this point one of the sysadmins arrived at the DC and started to power on the VMs manually, they started up no problem.

 

So what gives?  Why did HA fail so badly?  If there were no compatible hosts why could the sysadmin just turn the VMs on no problems?

(at first I thought maybe it was access to the datastores but the admin didn't have to do anything, they were all listed when he connected direct to the ESX host).

All the hosts were up, the HA Cluster config is as follows:

"Enable: Disallow VM power on operations that violate availability constraints"

"Precentage of cluster resources reserved as failover spare capacity: 50% CPU, 50% Memory"

All other options are default.

Since it was a full power down situation there were no running VMs, just ESX hosts. (This includes vCenter being down)

 

So my question is.. has anyone else seen this?  Do you know why it happened?

How Enable HA on cluster Without Power-On VM after host failure

$
0
0

HI,

 

I need a special use on HA clustering.

 

vCenter 5 update 1

 

Cluster 1 :5 x esxi 5.0 U1

 

Cluster 2 : 5 x esxi 4.1

 

In our infrastructure when we start a services (Group of 3 VMs) we need to start them in a special order, we can't put the group off VM in vApp because they are in different cluster (SQL CLuster, Oracle Cluster, App Cluster, ...)

*** With vApp you can specify a special start order

 

And when a host fail in a cluster we need to shutDown all the VM of the service Group to start them in right order.

 

Then i can't activate HA, when host fail i have to manualy by exploring DataStore from an other Host off the cluster and add to inventory, then only the admin can perform this task and the application team have to wait for him.

 

I would like to activate HA and the failed VM to be kept on OFF, then the application Team will have be able to restart the Services without waiting for the vSphere Admin.

 

I hope my message is clear, thanks for your help.

1 ios in progress for...

$
0
0

Last week I was trying to determine the root cause of a storage connection issue that I had and came across the following message in the fdm.log. The message repeated every second until the host was rebooted (this had resolved the storage connection problem):

 

 

2012-11-05T04:46:01.179Z [FFA7DB90 verbose 'Cluster' opID=SWI-cee8d54c] [HBDatastore::Heartbeat] 1 ios in progress for /vmfs/volumes/4daedc7c-6b68ec0d-916c-00219b8fbdcc

 

 

 

There were two errors in the fdm.log that preceeded the above error:

 

2012-11-05T01:02:52.502Z [FFBC2B90 warning 'Cluster' opID=SWI-79958859] [VmfsUtils::VerifyFileLock] VmkuserVMFS_VerifyLock returned error Unknown error status: 0xbad0001

 

 

2012-11-05T01:02:52.503Z [FFBC2B90 error 'Cluster' opID=SWI-79958859] [HBDatastore::DoHeartbeat] Lock verify for file /vmfs/volumes/4daedca2-10ab4da4-2e3a-00219b8fbdcc/.vSphere-HA/FDM-F9B29C80-2A7D-4E30-BCC7-BE9209A39F59-7-d638721-/host-43-hb failed: Invalid argument

 

 

 

My question is, what does "1 ios in progress for <volume name>" represent? I imagine since I was having storage issues that it is related to some datastore heartbeating of some sort, but I can't find any reference to what it means.


FT-Record/Replay functionality is not supported by this virtual machine Windows Server 2012 VM

$
0
0

I ran across a bug recently when I created a VM with the latest versions of ESXi 5.1.0-838463, Client ver 5.1.0-860230, vCenter ver 5.1.0-880146.  If you create a VM for a Windows 2012 Server OS for some reason Record/Replay is disabled and you cannot put it into Fault Tolerance mode.  I found a workaround though.  You delete the VM from inventory, then recreate the VM as a Windows 2008 R2 Server and use the VM that you deleted from inventory's origional disk instead of creating a new one.  Then enable FT.  Then go into settings and change the OS back to Windows 2012 Server.   Problem Fixed.  Hopefully they will fix this soon.

 

Jimmy S.

HA heartbeats (vsphere 5.0)

$
0
0

i got couple of question on how heartbeat mechanishm work with  HA  ( vsphere 5.0)

 

After going through documents, ther two types of heartbeating mechanism used

 

(i) network heartbeat (traditional method)

(ii)datastore heartbeat (Master agent uses it to correctly validate status of slaves when master cannot communicate with slaves via management network)

 

my question really is how both actually works?

 

Network heart beat

 

i understand that , master sends heartbeat to all of its slaves, all slave sends heartbeat to its master. slaves do not send heartbeats to eachother.

 

what information is actually exchanged through heartbeat,  it sends some kind of text message to master indication that it is alive?. can that be viewed through logs

if it sends heartbeat every second, will there be a heartbeat log in the logfile for every second. what is actually meaning of heartbeat here.

 

Datastore heartbeat

 

it uses "heartbeat region file".  i understand that, on nfs datastore,  datastore heartbeat files(host-number-hb) touched every 5 seconds by corresponding host, and ha(master) validates it using timestamp of the file.

 

we actually use VMFS volumes, datastore that are chosen for heartbeating has "host-<number>-hb" file created for each host in two datastore chosen for heartbeating.  these files will never be updated right? Please confirm. i see that timestamp of these files never changed since HA configured i think.  

so HA(master) validates the status of hosts by checking whether these files are open by corresponsing host right.

 

For example,   2 noes in the cluster and its files are below

 

node 1 -  host-1419-hb

node 2  - host-1494-hb

 

so HA validates status of node1 by checking host-1419-hb file is open by nod1 right. respectively for node2.  but none will be written or updated to these files

by respective hosts right.

 

is there way for us to find out which file is opened by which host on the command line, how does HA validates file is opened by which host

In HA after Host failure virtual machine grayed out.

$
0
0

Hi,

 

Iam doing Evaluation on HA.

 

I have created two ESXI host inside the cluster both HA and DRS are enabled on the cluster.i have put DRS on fully automated mode.once an ESXI host failed virtual machine get powered off and getting powered on.

 

But the issue is this virtual machine is grayed out and i cant do any thng on that virtual machine which is under the failed ESXI host.

 

Any help will be greatly appreciated

HA and VM restarts...

$
0
0

Hi

 

Our 4 node cluster had networking issues last night which resulted in isolated hosts and VMs getting shut down.  The hosts are running ESX 4.1 and have the default HA cluster settings defined (medium restart priority and VM shut down).  DRS is set to manual and we lots of DRS rules keeping clustered VMs apart etc, nothing major.

 

I came in this morning to find that all the VMs in the estate (about 200) were all powered off including vCentre.

 

Only after we managed to locate and power on the VC database and virtual centre did the VMs begin powering back on automatically and moving hosts (all done by vpxuser).  The network issue appears to have been sorted during the night (a possibly switch reboot or something, still trying to find out from the comms team) so why have the VMs attempted to be restarted now the hosts are not complaining about being isolated?

 

I was under the impression that HA works *without* vCentre being available (i.e. VMs will restart on other hosts) but this doesnt appear to have been the case here.

 

There seems to be a fair few events of "Virtual machine was restarted on host-esx03.fqdn.int since host-esx01.fqdn.int failed" followed immediately with "Failover unsuccessful for this virtual machine" even though the VM *was* moved from esx01 to esx03 but not powered on

 

Seems very odd...

 

Can anyone shed any light on this?

 

thanks

ApplicationHA and Faulth Tolerance together?

$
0
0

Hi,

please does Symantec ApplicationHA and Faulth Tolerance work together?

Viewing all 845 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>