When considering a Data Center with 99.999% uptime, Availability is a most prominent factor for providing services to clients. When offering services on a platform like VMware vSphere, the effect of downtime or lost time grown exponentially as many services run on a single physical host or server. For this purpose, VMware introduced a feature called VMware vSphere High Availability (HA). VMware vSphere High Availability, simply referred to as HA, delivers a radically simple and cost effective solution to increase availability for any application running on a VM in enterprise setup regardless of its operating system. HA is configured by using some simple steps via vCenter Server by using simple interface such as vSphere Web Client.
HA enables you to create a cluster using multiple ESXi hosts (3-64) in vCenter Server. This will allow you to safeguard virtual machines (VMs) and their applications and workloads. When failure occurs on one of the hosts (3-64 hosts per cluster) in the cluster, VMs hosted on that host are automatically restarted on other ESXi hosts within that same VMware vSphere Cluster.
if Guest OS fails, HA will restart the failed Guest OS. This HA feature is known as VM Monitoring, and sometimes also referred to as VM-HA. This might sound like complex process but can be implemented easily in vCenter Server with the help of vSphere Web Client interface.
Components of High Availability
There are three main components of High Availability (HA)
Note: For configuring High Availability (HA) in VMware vSphere environment, you will have to configure Cluster which requires 3-64 hosts per cluster in vSphere 6.0.
· Fault Domain Manager (FDM)
· HOSTD Agent
· vCenter Server
The most important component of HA is Fault Domain Manager normally known as FDM in VMware vSphere environment. It is responsible for:
· Communicating host resource information, VM states, and HA properties with other ESXi hosts that are part of vSphere Cluster.
· Handling heartbeat mechanisms, VM placement on hosts, and logging ect.
HA does not depend on DNS information; it works fine with IP addresses only. But it is recommended that hosts in a vSphere Cluster should be registered with their FQDN for ease of operations and management.
HOSTD is an agent and is responsible for powering VMs in case of any failure such host, VM etc. FDM directly talks with HOSTD agent and vCenter Server to perform HA more efficiently.
vCenter Server is a core or single pane of glass for a virtual environment and is responsible for overall management of your virtual infrastructure. vCenter Server pushes the FDM agents in parallel to the ESXi hosts of the cluster for faster deployment and multiple configurations.
Though vCenter Server is responsible for configuration, management, and VM state info of HA but it not involved in process when HA responds to failure. It is comforting to know that in case of a host failure containing the virtualized vCenter Server, HA takes care of the failure and restarts the vCenter Server on another host, including all other configured virtual machines from that failed host. If you want to learn in details about the concept and working of HA, you can follow Duncan Apping’s Deep Dive on vSphere HA
Figure: Thanks to VMware
vCenter Server manages cluster to handle multiple hosts in HA. A cluster is considered as a resource collector in your virtual environment. These resources can be carved up with the use of vSphere Distributed Resource Scheduler (DRS) into separate pools of resources or used to increase availability by enabling HA.
The HA introduces master and slave concept for HA agent except during network partitions. Any agent can serve as a master, and all others will be considered as slaves. A master agent is responsible for monitoring the health and restarting of VMs if any of them go down. The slave agents forward information to the master agent and restart any VM at the direction of the master agent. The HA agent, either it is a master or slave, it implements the VM/App monitoring feature which allows it to restart VM in case of OS failure or restart services in the case of an application failure.
I hope you have enjoyed reading this post. Thanks for reading! Be social and share it to social media if you feel worth sharing it.