In this post, we will be looking at how vSAN handles failures, namely APD, PDL and Network Partitioning failures. To keep things simple, I will be using FTT=1 as an example policy.
Before we get into the vSAN failures handling piece, let’s first agree on what PDL, APD and Network Partitioning failures mean.
Physical Device Loss
A PDL status occurs when a device is known to have failed and it is unlikely that it will return. In other words, vSAN is 100% certain that the device in question is dead!
All Paths Down
An APD status occurs when a device loses connectivity and VSAN is unable to determine if it will return. In other words, vSAN is not sure what happened to the device. APD failures could be caused by a host restarting, someone pulling out a disk or by a drive or HBA becoming disconnected etc.
A network Partition is the situation where some of your hosts can communicate with each other, but cannot communicate with the remaining hosts in the cluster.
How vSAN handles PDL
If the host that is holding the replica loses a disk for example, your vSAN object will be in a degraded state. Your source host will contact vSAN and request for the replica to be rebuilt immediately. vSAN will select a new host and the rebuilt operation will start if there are resources available. If there are no resources available it will wait until the failure has been resolved.
How vSAN handles APD
If the host that is holding the replica somehow ends up with a disk that is disconnected, vSAN will mark the objects as absent and will wait for 60 minutes (This setting can be changed VSAN.ClomRepairDelay) before starting the rebuilt operation. If the drive that was disconnected is reconnected, vSAN will try to determine if it’s more efficient to continue replicating to the new copy or to the old copy.
How vSAN Network Partition
In this situation, the action taken will depends on the scenario at hand. Duncan Epping has a great post regarding vSAN Isolation / Partition scenarios and how there are handled, please check it out here.
I hope this post was useful. Be social share.
How useful was this post?
Click on a star to rate it!
Average rating / 5. Vote count:
My name is Amine El Badaoui and I currently live in Aylesbury, a small town in the south east of England
I have been working in the IT industry for few years now and specialise in VMware virtualisation, data centre infrastructure and cloud technologies. Over the years I have obtained numerous industry certifications from Microsoft, Netapp and VMware.I currently work as a VMware Product Engineer @ https://www.rackspace.com/
This blog represents my random technical notes and thoughts. The thoughts expressed here do not reflect my current employer in anyway.