FTT is familiar name when working on vSAN and I consider this as most important policy when designing a vSan cluster. In this post, I am going to quickly walk you through the use of this policy and how it works..
The Number of Failures to Tolerate capability addresses the key customer and design requirement of availability. With FTT, availability is provided by maintaining replica copies of data, to mitigate the risk of a host failure resulting in lost connectivity to data or potential data loss. To start with let me tell you about the number of hosts required to achieve the specific FTT value:
If you want to tolerate n failures, then you need 2n + 1 ESXi hosts in the VSAN cluster, like shown in below table:
What if the cluster do not have required number of hosts to satisfy the defined FTT value in storage policy and you are trying to deploy a vm ? vm creation would fail with error as “Cannot complete file creation operation”.
- FTT=0 Results in 40GB of used capacity (not recommended)
- FTT=1 Results in 80GB of used capacity (n+1)
- FTT=2 Results in 120GB of used capacity (n+2)
- FTT=3 Results in 160GB of used capacity (n+3)
Limitations of a Two-Host or Three-Host Cluster Configuration:
In a two hosts cluster (2 hosts + 1 witness) and three-host clusters, only FTT supported is 1. vSAN saves each of the two required replicas of vm data on separate hosts with witness object on a third host. Because of the few hosts in the cluster, the following limitations exist:
When a host fails, vSAN cannot rebuild data on another host to protect against another failure and due to this data becomes non-compliant till the host is back.
- If a host must enter maintenance mode, vSAN cannot evacuate data from the host to maintain policy compliance. While the host is in maintenance mode, data is exposed to a potential failure or inaccessibility if an additional failure occurs. So, full data migration is not supported in this 2 or 3 node clusters.
- In any situation where two-host or three-host cluster has an inaccessible host or disk group, vSAN objects are at risk of becoming inaccessible should another failure occur.
Erasure Encoding (RAID 5):
Erasure coding provides the same levels of redundancy as mirroring but with a reduced capacity requirement. With RAID 5, a minimum of four hosts are required. Capacity consumption with RAID 5 erasure coding is reduced by 33 percent while still providing a FTT of 1.
Erasure coding is a method of taking data, breaking it into multiple pieces, and spreading it across multiple devices, while adding parity data. The method of spreading the data across multiple devices while adding parity data allows the data to be recreated if one or more of the data pieces is corrupted or lost. Although several methods of erasure coding exist, vSAN supports a RAID 5 and RAID 6 type of data placement and parity pattern as a method of surviving failures and providing space efficiency when compared to RAID 1 mirroring. With RAID 5, the data is placed in a 3 + 1 pattern across hosts. If a single host fails, data is still available. Like any other storage solution, vSAN RAID 5 requires less capacity than mirroring but a performance penalty might exist for workloads that are extremely write intensive or very sensitive to latency.
Erasure Encoding (RAID 6):
With RAID 6, the number of failures to tolerate is two and a minimum of six hosts are required. Capacity consumption with RAID 6 erasure coding is reduced by 50 percent while still providing a FTT of 2.
Note: RAID 5/6 (erasure coding) does not support 3 failures to tolerate.
Thanks for Reading