vSAN PDL APD and Network Partitioning Failures

In this post, we will be looking at how vSAN handles failures, namely APD, PDL and Network Partitioning failures. To keep things simple, I will be using FTT=1 as an example policy.

Before we get into the vSAN failures handling piece, let’s first agree on what PDL, APD and Network Partitioning failures mean.

Physical Device Loss

A PDL status occurs when a device is known to have failed and it is unlikely that it will return. In other words, vSAN is 100% certain that the device in question is dead!

All Paths Down

An APD status occurs when a device loses connectivity and VSAN is unable to determine if it will return. In other words, vSAN is not sure what happened to the device. APD failures could be caused by a host restarting, someone pulling out a disk or by a drive or HBA becoming disconnected etc.

Network Partition

A network Partition is the situation where some of your hosts can communicate with each other, but cannot communicate with the remaining hosts in the cluster.

How vSAN handles PDL

If the host that is holding the replica loses a disk for example, your vSAN object will be in a degraded state. Your source host will contact vSAN and request for the replica to be rebuilt immediately. vSAN will select a new host and the rebuilt operation will start if there are resources available. If there are no resources available it will wait until the failure has been resolved.

How vSAN handles APD

If the host that is holding the replica somehow ends up with a disk that is disconnected, vSAN will mark the objects as absent and will wait for 60 minutes (This setting can be changed VSAN.ClomRepairDelay) before starting the rebuilt operation. If the drive that was disconnected is reconnected, vSAN will try to determine if it’s more efficient to continue replicating to the new copy or to the old copy.

How vSAN Network Partition

In this situation, the action taken will depends on the scenario at hand. Duncan Epping has a great post regarding vSAN Isolation / Partition scenarios and how there are handled, please check it out here.

I hope this post was useful. Be social share.

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

Sharing is caring!

Leave a Reply

Your email address will not be published. Required fields are marked *