vSan object checksum – Capt. Virtualization

vSAN

Eknath - 16 February 20193 September 2019

Reading Time: 4 mins

VSAN Object Checksum:

VSAN 6.2 introduced a new feature called object checksum which is enabled by default, but may be enabled or disabled on per virtual machine/object basis via VM storage policies and this policy does end-to-end software checksum to avoid data integrity issues arising due to problems on the underlying storage media. The only reason one might disable it is if the application already has this functionality included.

Rule Name: Disable object check sum

Above screenshots are from vSAN 6.2

If the option is set to No, the object calculates checksum information to ensure the integrity of its data. If this option is set to Yes, the object does not calculate checksum information.

vSAN uses end-to-end checksum to ensure the integrity of data by confirming that each copy of a file is exactly the same as the source file. The system checks the validity of the data during read/write operations, and if an error is detected, vSAN repairs the data or reports the error.

If a checksum mismatch is detected, vSAN automatically repairs the data by overwriting the incorrect data with the correct data. Checksum calculation and error-correction are performed as background operations.

The default setting for all objects in the cluster is No, which means that checksum is enabled.

How it works?

Checksum in VSAN is implemented using a CRC32 (cyclic redundancy check) algorithm for best performance, that supports CPU offload to reduce overhead. Alongside the checksum verification on read operations, VSAN also has a scrubber mechanism which checks the data on disk to ensure that it doesn't not have any silent corruption. By default, scrubber is designed to check all of the data once a year, but this can be changed via the advanced setting VSAN.ObjectScrubsPerYear to run more often. For instance, to run this check once in a week, set this value to 52, but be aware that there will be some performance overhead when this operation runs. In addition, there are two levels of scrubbing employed:

1.Component-level scrubbing – every block of each component is checked. If there is a checksum mismatch, the scrubber tries to repair the block by reading other components.

2.Object-level scrubbing – for every block of the object, data of each mirror (or the parity blocks in RAID-5/6) is read and checked. For inconsistent data, all data in the affected stripe is marked as bad.

During read or write operations software checksum detect the corruptions that could be caused by hardware/software components, including memory, drives, and so on.. In case of drives, there are two basic kinds of corruption.

latent sector errors - which are typically the result of a physical disk drive malfunction.
silent corruption errors - which can happen without warning (these are typically called silent data corruption).

Undetected or completely silent errors can lead to lost or inaccurate data and significant downtime. There is no other effective means of detection for these type of errors without an end-to-end integrity checking mechanism. During the read/write operations, vSAN checks for the validity of the data based on the checksum. Every 4KB block will have a checksum associated with it. The checksum is 5 bytes in size. When the data is written, the checksum is verified on the same host where the data originates to ensure that if there is any corruption in-flight over the network, it is detected. The checksum is persisted with the data, so that If the data is not valid, vSAN takes the necessary steps to either correct the data or report it to the user to take action. These actions could be as follows:

To retrieve a new copy of the data from other replica of the information, stored within the RAID1, RAID5/6 constructs. This is referred to as recoverable data.
If there is no valid copy of the data found, an error is returned. These are referred to as non-recoverable errors.

Checksum is fully supported with all of the new features, such as RAID-5/6 erasure encoding, deduplication and compression, configurations such as VSAN stretched cluster. As mentioned, it is enabled by default with no extra configurations. And if you don’t want it, just disable it in the VM Storage Policy.

Thanks for Reading..