Here are some of the important commands that can be used while troubleshooting vSan: Please note this is for quick ref if you are looking for commands and detailed explanation will be given in posts.
Get the total number of hosts that are part of vsan cluster:
- esxcli vsan cluster get
Manually join the host into vsan cluster:
- esxcli vsan cluster join -u uuid
Leave the host from vsan cluster
- esxcli vsan cluster leave
Change the Resync copy flight values in host, this will boost the resync speed.
- vsish -e get /vmkModules/vsan/dom/MaxNumResyncCopyInFlight
- vsish -e set /vmkModules/vsan/dom/MaxNumResyncCopyInFlight 5
Disable resync throttle:
- esxcfg-advcfg -g /VSAN/DomCompResyncThrottle
- esxcfg-advcfg -s 0 /VSAN/DomCompResyncThrottle
Check congestion in any of the host in vSAN cluster:
- for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion;done
Check resync status: Resync triggers
while true;do echo "" > ./resyncStats.txt ;cmmds-tool find -t DOM_OBJECT -f json |grep uuid |awk -F \" '{print $4}' |while read i;do pendingResync=$(cmmds-tool find -t DOM_OBJECT -f json -u $i|grep -o "\"bytesToSync\": [0-9]*,"|awk -F " |," '{sum+=$2} END{print sum / 1024 / 1024 / 1024;}');if [ ${#pendingResync} -ne 1 ]; then echo "$i: $pendingResync GiB";fi;done |tee -a ./resyncStats.txt;total=$(cat resyncStats.txt |awk '{sum+=$2} END{print sum}');echo "Total: $total GiB" |tee -a ./resyncStats.txt;total=$(cat ./resyncStats.txt |grep Total);totalObj=$(cat ./resyncStats.txt|grep -vE " 0 GiB|Total"|wc -l);echo "`date +%Y-%m-%dT%H:%M:%SZ` $total ($totalObj objects)" >> ./totalHistory.txt; sleep 10;done
Delete a specific object:
- /usr/lib/vmware/osfs/bin/objtool delete -u <object uuid> -f -v 10
Change the value of goto11 and TcpipHeapMax in host:
- esxcli system settings advanced list –o /VSAN/goto11
- esxcli system settings advanced list –o /Net/TcpopHeapMax
- esxcli system settings advanced set -o /Net/TcpipHeapMax -i 1536
- esxcli system settings advanced set -o /VSAN/goto11 -i 1
Get the LSOM log congestion values:
- esxcfg-advcfg -g /LSOM/lsomLogCongestionHighLimitGB
- esxcfg-advcfg -g /LSOM/lsomLogCongestionLowLimitGB
Below commands to change the lsomLogCongestion value in hosts:
- esxcfg-advcfg -s 24 /LSOM/lsomLogCongestionLowLimitGB
- esxcfg-advcfg -s 32 /LSOM/lsomLogCongestionHighLimitGB
Dude can you remove the following from your blog, this cannot be used in any production environments and will cause hosts to PSOD.
Stop and start the resync in vsan cluster:
vsish -e set /vmkModules/vsan/dom/PauseAllResync 1 => To pause the resynchronization.
vsish -e set /vmkModules/vsan/dom/PauseAllResync 0 => To start the resynchronization.
Yes, Removed .. Thanks Hareesh