Convert multi node vsan cluster to 2 node stretched cluster

Reading Time: 6 mins
Recently I have got a request to convert the multi node vsan cluster (3 hosts) to stretched cluster (2 hosts + 1 witness). Since this is for a production environment with FTT 1, I planned this in two phases:

Phase 1:

 

1. Deploy the witness appliance with all necessary licensing, networking, routing and configurations.
2. Once all set, try to ping the vsan witness from any of the ESXi hosts that are part of cluster and they should be able to ping via vsan port group. In my case, hosts have vsan enabled for vmk5 kernel port group, ping worked as shown in below output:
  vmkping -I vmk5 192.168.1.8 - where 192.168.1.8 is the vsan ip assigned to witness appliance
3. Take ssh to witness appliance and try to reach the vsan cluster hosts, In my case witness appliance has vmk1 for vsan:
  vmkping -I vmk1 192.168.1.4 - where 192.168.1.4 is the vsan ip assigned to one of the hosts in cluster.
4. Now all set to add the witness appliance to cluster, It is always recommended to have a cluster dedicated for witness appliances, this is just to separate the appliance from ESXi cluster.
5. Before any change it is always good to capture the vsan cluster health status, resync status, inaccessible objects, pro active tests. Especially when working on host reduction make sure to check the performance of hosts. Check the performance graphs of last one month or so, if no peak utilisation is seen, then you should be good to reduce the cluster, otherwise hosts might get into resource crunch and affects the vm performance.
6. Pick the host to remove from cluster. I have a tip here: Pick the host that has bad past i.e., the one with hardware failures, or with unique configuration in terms of processor, model and so on.
7. Proceed to next phase only if all checks are passed in step 5.
8. Take ssh to any of the host in vsan cluster, use command esxcli vsan cluster get, output should contain statement as below, this proves that 3 hosts are currently part of vsan cluster and note the Sub-Cluster Member UUIDs.
                           Sub-Cluster Member Count: 3

Phase 2:

 

1. Initiate MM on host that is to be removed - Since full data migration is not possible in 3 node cluster, host should be moved into MM with ensure accessibility, so objects residing in the host will be non- compliant with policy. But the vm's keep running as the FTT is 1 , which will have 3 components for every object i.e., 1 active, 1 replica and 1 witness.. Now in 3 node cluster, since 1 host is in MM, objects will have 1 inactive component, 1 active component and 1 witness appliance - with > 50% of votes, vm's should run without any issue.
2. Once entered maintenance mode successfully, Create a stretched cluster with one host pointing to primary FD and other one to secondary FD
3. This should triggered resync as the witness components are moved to witness appliance and all the inactive components should be rebuild again.
4. During the whole process, there will be critical alert for VSAN health data for the active rebuild.
5. Once resync completes, you can just move the host in MM to the root folder.
6. Check the vsan cluster health, resync status, inaccessible objects and pro active tests.
7. Take ssh to witness appliance, use command esxcli vsan cluster get, output should contain statement as below, this proves that 3 hosts are currently part of vsan cluster and note the Sub-Cluster Member UUIDs. Match with UUID that was captured in phase 1 step 8, you can observe clearly that 1 UUID will be missing and 1 new UUID will be present. The missing UUID is for the host that is removed and new UUID is for the witness appliance.
                             
                             Sub-Cluster Member Count: 3

Above phases can be used to convert any no of nodes in vsan cluster to stretched cluster with some additional steps: 

Assume there is a 5 node cluster and it has to be converted to stretched cluster, check the performance statistics of all hosts in the cluster and if they are ok, then pick the hosts to be removed. To repeat again, always pick the host that as bad past and unique one. After picking the hosts to be removed,

1. To check the no of hosts in vsan cluster use the command esxcli vsan cluster get, output should contain statement as below, this proves that 5 hosts are currently part of vsan cluster

                               Sub-Cluster Member Count: 5

2. Check the vsan cluster health, resync status, inaccessible objects and pro active tests. If found healthy with no issues the proceed as below.
3. Initiate full data migration in one of the host, wait for the resync to complete.
4. Once host is in maintenance mode, take ssh to host and use the below command to leave the host from vsan cluster:
                            esxcli vsan cluster leave 
5. Since there is no point of having the host in cluster, just throw it in datacenter root folder.
6. Now use the command esxcli vsan cluster get and this should show the output for no of hosts as 4
                              Sub-Cluster Member Count: 4
7. To proceed with further reduction, repeat steps 1 - 6, after successfully removing the 2nd host from cluster, no of member count should be now 3:
                          Sub-Cluster Member Count: 3
8. Now the cluster has 3 hosts and not possible to remove with full data migration, so follow the process mentioned in phase 1 and then phase 2.
Thats it, Thanks very much for reading the post.
Please let me know if you have any questions or other requirement by commenting below.