With Tanzu Kubernetes Grid, you can deploy Kubernetes clusters across software-defined datacenters (SDDC) and public cloud environments, including vSphere, Microsoft Azure, and Amazon EC2, providing organizations a consistent, upstream-compatible, regional Kubernetes substrate that is ready for end-user workloads and ecosystem integrations.
In this post, I will explain the detailed steps to deploy TKG cluster on Azure (version: 1.5.2) by using separate VNETs for Management and workload clusters with NAT Gateway. I have tried to put together the components used in this demo as a simple architecture and I hope this helps you to understand how each of them talk to each other.
Prepare the setup
Create VNET, subnets and NAT gateway for management cluster
Login to Azure portal > Virtual networks > Create
Note: create new resource group or select an existing one, It is recommended to isolate the resources, so I choose to create new resource group(capv-mgmt-RG).
Click on Next: Ip Addresses
Provide IPv4 address space as shown below, followed by create subnets with min /28 CIDR for each
Review + Create > Create
In Azure portal, Navigate to NAT gateways > Create >
Click on Next: Outbound IP
create a new public IP address as below and click OK
Click on Next: Subnet
select the management vnet from drop down and select subnets created earlier.
Note: I do not want to use NAT gateway for bootstrap vm, so left it unchecked.
Review + Create > Create
Create Resource group, VNET, subnets and NAT gateway for workload cluster
Login to Azure portal > Virtual networks > Create
Note: create new resource group or select an existing one, It is recommended to isolate the resources, so I choose to create new resource group (capv-workload-RG)
Click on Next: Ip Addresses
Provide IPv4 address space as shown below, followed by create subnets with min /28 CIDR for each
Review + Create > Create
In Azure portal, Navigate to NAT gateways > Create >
Click on Next: Outbound IP
create a new public IP address as below and click OK
Click on Next: Subnet
select the workload vnet from drop down and select subnets created earlier.
Review + Create > Create
Configure VNET Peering
In Azure portal, Navigate to Virtual networks > management VNET (capv-mgmt-vnet) created earlier > Peerings > Add
This virtual network:
Peering link name: provide a name, In this case I have provided mgmtvnettoworkloadvnet
Remote virtual network
Peering link name: provide a name, In this case I have provided workloadvnettomgmtvnet
In Azure portal, Navigate to Virtual networks > management VNET (capv-mgmt-vnet) created earlier > Peerings
In Azure portal, Navigate to Virtual networks > workload VNET (capv-workload-vnet) created earlier > Peerings
Create NSG
Tanzu Kubernetes Grid management and workload clusters on Azure require two Network Security Groups (NSGs) to be defined on the cluster’s VNet and in its VNet resource group:
An NSG named <CLUSTER-NAME>-controlplane-nsg and associated with the cluster’s control plane subnet. For this demo, two NSG’s are created:
capv-mgmt-controlplane-nsg: For management cluster control plane subnet
capv-workload-controlplane-nsg: For Workload cluster control plane subnet
An NSG named <CLUSTER-NAME>-node-nsg and associated with the cluster worker node subnet
capv-mgmt-node-nsg: For management cluster worker node subnet
capv-workload-node-nsg: For workload cluster worker node subnet
NSG for management cluster control plane subnet
In Azure portal, Navigate to Network Security Groups > Create
Select management resource group (capv-mgmt-RG) from dropdown and name like: capv-mgmt-controlplane-nsg
Review + Create > Create
In Azure portal, Navigate to Virtual Networks > management vnet (capv-mgmt-vnet) > Subnets > management cluster control plane subnet ( capv-mgmt-cp-A ) > Network security group > capv-mgmt-controlplane-nsg > Save
NSG for management cluster worker node subnet
In Azure portal, Navigate to Network Security Groups > Create
Select management resource group (capv-mgmt-RG) from dropdown and name like: capv-mgmt-node-nsg
Review + Create > Create
In Azure portal, Navigate to Virtual Networks > management vnet (capv-mgmt-vnet) > Subnets > management cluster workload subnet ( capv-mgmt-worker-A ) > Network security group > capv-mgmt-node-nsg > Save
NSG for Workload cluster control plane subnet
In Azure portal, Navigate to Network Security Groups > Create
Select management resource group (capv-workload-RG) from dropdown and name like: capv-workload-controlplane-nsg
Review + Create > Create
In Azure portal, Navigate to Virtual Networks > workload vnet (capv-workload-vnet) > Subnets > workload cluster control plane subnet ( capv-workload-cp-A ) > Network security group > capv-workload-controlplane-nsg > Save
NSG for workload cluster worker node subnet
In Azure portal, Navigate to Network Security Groups > Create
Select management resource group (capv-workload-RG) from dropdown and name like: capv-workload-node-nsg
Review + Create > Create
In Azure portal, Navigate to Virtual Networks > workload vnet (capv-workload-vnet) > Subnets > workload cluster worker node subnet ( capv-workload-worker-A ) > Network security group > capv-workload-node-nsg > Save
Before starting with the deployment, it is important to ensure that the necessary pathways are open to all pieces of the clusters and that they are able to talk to one another.
Control Plane VMs/Subnet – HTTPS Inbound/Outbound to Internet and SSH and Secure Kubectl (22, 443, and 6443) Inbound/Outbound within the VNet
Worker Node VMs/Subnet – Secure Kubectl (6443) Inbound/Outbound within the VNet
Deploy a boot strap machine and Install Docker, Carvel tools, Tanzu CLI and kubectl using script
Login to Azure portal > virtual machines > Create > Azure virtual machine > fill in the values as shown below:
Review + Create > Create
Once the bootstrap vm is deployed successfully, download tanzu cli (VMware Tanzu CLI for Linux) and kubectl (Kubectl cluster cli v1.22.5 for Linux) from vmware connect
Copy the downloaded files into bootstrap vm home directory (/home/azureuser)
Connect to vm and create a file on home directory (/home/azureuser) of bootstrap jumpbox with name as prepare-setup.sh
prepare-setup.sh
#!/bin/bash echo "######### Installing Docker ############" sudo apt-get update sudo apt-get install ca-certificates curl gnupg lsb-release curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io -y sudo usermod -aG docker $USER mkdir $HOME/tanzu cd $HOME/tanzu cp $HOME/tanzu-cli-bundle-linux-amd64.tar.gz $HOME/tanzu cp $HOME/kubectl-linux-v1.22.5+vmware.1.gz $HOME/tanzu echo "################# Extracting the files ###################" gunzip tanzu-cli-bundle-linux-amd64.tar.gz tar -xvf tanzu-cli-bundle-linux-amd64.tar gunzip kubectl-linux-v1.22.5+vmware.1.gz cd $HOME/tanzu/cli echo "################ Installing Tanzu CLI ###################" sudo install core/v0.11.2/tanzu-core-linux_amd64 /usr/local/bin/tanzu tanzu init tanzu version tanzu plugin sync tanzu plugin list #wget into a new folder and move the contents to Tanzu directory cd ~/tanzu chmod ugo+x kubectl-linux-v1.22.5+vmware.1 sudo install kubectl-linux-v1.22.5+vmware.1 /usr/local/bin/kubectl cd $HOME/tanzu/cli gunzip ytt-linux-amd64-v0.35.1+vmware.1.gz gunzip kapp-linux-amd64-v0.42.0+vmware.1.gz gunzip kbld-linux-amd64-v0.31.0+vmware.1.gz gunzip imgpkg-linux-amd64-v0.18.0+vmware.1.gz chmod ugo+x ytt-linux-amd64-v0.35.1+vmware.1 chmod ugo+x imgpkg-linux-amd64-v0.18.0+vmware.1 chmod ugo+x kapp-linux-amd64-v0.42.0+vmware.1 chmod ugo+x kbld-linux-amd64-v0.31.0+vmware.1 sudo mv ./ytt-linux-amd64-v0.35.1+vmware.1 /usr/local/bin/ytt sudo mv ./kapp-linux-amd64-v0.42.0+vmware.1 /usr/local/bin/kapp sudo mv ./kbld-linux-amd64-v0.31.0+vmware.1 /usr/local/bin/kbld sudo mv ./imgpkg-linux-amd64-v0.18.0+vmware.1 /usr/local/bin/imgpkg echo "################# Verify Tanzu CLI version ###################" tanzu version echo "################# Verify Kubectl version ###################" kubectl version echo "################# Verify imgpkg version ###################" imgpkg --version echo "################# Verify kapp version ###################" kapp --version echo "################# Verify kbld version ###################" kbld --version echo "################# Verify ytt version ###################" ytt --version echo "reboot bootstrap JB ###" sudo reboot
Commands to execute
## Make the script (prepare-setup.sh) executable :
chmod +x prepare-setup.sh
## Run the script
./prepare-setup.sh
Note: In the end, JB gets rebooted. so once the vm is up, reconnect again.
Create service principal in Azure
Login to Azure portal > Azure Active Directory > App registrations > New registration – Give a Name
Click on newly cleared application (service principal) and copy below req info in notepad, this will be used while creating management cluster:
Application (client) ID
Subscription ID
Navigate to Subscriptions > IAM > Add role assignment > Contributor > Next > + Select members > search for application created earlier > Select > Next > Review + assign
Navigate to Azure Active Directory > App registrations > click on application created earlier > Certificates & secrets > + New client secret > give a description > Add
Copy the value and save in notepad, this the CLIENT_SECRET
Download and Install Azure CLI in boot strap machine:
Click here to find the steps to install azure cli in boot strap machine.
Accept the Base Image License:
# Sign in to the Azure CLI with your tkg service principal.
az login --service-principal --username AZURE_CLIENT_ID --password AZURE_CLIENT_SECRET --tenant AZURE_TENANT_ID
# where AZURE_CLIENT_ID, AZURE_CLIENT_SECRET and AZURE_TENANT_ID values are collected earlier and saved into notepad.
az vm image terms accept --publisher vmware-inc --offer tkg-capi --plan k8s-1dot22dot5-ubuntu-2004 --subscription <subscription id collected earler>
# Ex: az vm image terms accept --publisher vmware-inc --offer tkg-capi --plan k8s-1dot22dot5-ubuntu-2004 --subscription <subscription ID>
Create new key pair:
To connect to Azure TKG vm’s (management cluster or workload vm’s), the bootstrap machine must provide the public key part of an SSH key pair. If your bootstrap machine does not already have an SSH key pair, you can use a tool such as ssh-keygen to generate one.
# On your bootstrap machine, run the following ssh-keygen command.
# At the prompt Enter file in which to save the key (/root/.ssh/id_rsa): press Enter to accept the default.
#Enter and repeat a password for the key pair.
#Add the private key to the SSH agent running on your machine, and enter the password you created in the previous step.
ssh-add ~/.ssh/id_rsa
Create Management cluster
Create Management cluster using config file
Create a config file in bootstrap machine ( name as mgmt-clusterconfig.yaml ) with below content by providing or replacing the values wherever required.
mgmt-clusterconfig.yaml
CLUSTER_NAME: capv-mgmt ## Optional CLUSTER_PLAN: prod ## Optional NAMESPACE: default CNI: antrea IDENTITY_MANAGEMENT_TYPE: none INFRASTRUCTURE_PROVIDER: azure #! --------------------------------------------------------------------- #! Node configuration #! --------------------------------------------------------------------- CONTROL_PLANE_MACHINE_COUNT: 3 WORKER_MACHINE_COUNT: 3 AZURE_CONTROL_PLANE_MACHINE_TYPE: "Standard_D2s_v3" AZURE_NODE_MACHINE_TYPE: "Standard_D2s_v3" #! --------------------------------------------------------------------- #! Azure Configuration #! --------------------------------------------------------------------- AZURE_ENVIRONMENT: "AzurePublicCloud" AZURE_TENANT_ID: <redacted> ## Provide TENANT_ID AZURE_SUBSCRIPTION_ID: <redacted> ## Provide SUBSCRIPTION_ID AZURE_CLIENT_ID: <redacted> ## Provide CLIENT_ID AZURE_CLIENT_SECRET: <redacted> ## Provide CLIENT_SECRET AZURE_LOCATION: westus2 AZURE_SSH_PUBLIC_KEY_B64: <redacted> ## Provide SSH_PUBLIC_KEY in encoded format AZURE_CONTROL_PLANE_SUBNET_NAME: "capv-mgmt-cp-A" ## To be changed if using a diff name AZURE_CONTROL_PLANE_SUBNET_CIDR: 192.168.1.0/24 ## To be changed if using a diff CIDR AZURE_NODE_SUBNET_NAME: "capv-mgmt-worker-A" ## To be changed if using a diff name AZURE_NODE_SUBNET_CIDR: 192.168.2.0/24 ## To be changed if using a diff CIDR AZURE_RESOURCE_GROUP: "capv-mgmt-RG" ## To be changed if using a diff RG AZURE_VNET_RESOURCE_GROUP: "capv-mgmt-RG" ## To be changed if using a diff RG AZURE_VNET_NAME: "capv-mgmt-vnet" ## To be changed if using a diff name AZURE_VNET_CIDR: 192.168.0.0/16 ## To be changed if using a diff CIDR AZURE_ENABLE_PRIVATE_CLUSTER : "true" AZURE_FRONTEND_PRIVATE_IP : 192.168.1.15 ## To be changed to use diff IP # AZURE_ENABLE_ACCELERATED_NETWORKING : "" #! --------------------------------------------------------------------- #! Machine Health Check configuration #! --------------------------------------------------------------------- ENABLE_MHC: ENABLE_MHC_CONTROL_PLANE: true ENABLE_MHC_WORKER_NODE: true MHC_UNKNOWN_STATUS_TIMEOUT: 5m MHC_FALSE_STATUS_TIMEOUT: 12m MACHINE_HEALTH_CHECK_ENABLED: "true" #! --------------------------------------------------------------------- #! Common configuration #! --------------------------------------------------------------------- ENABLE_AUDIT_LOGGING: true ENABLE_DEFAULT_STORAGE_CLASS: true CLUSTER_CIDR: 100.96.0.0/11 SERVICE_CIDR: 100.64.0.0/13 #! --------------------------------------------------------------------- #! Autoscaler configuration #! --------------------------------------------------------------------- ENABLE_AUTOSCALER: false #! --------------------------------------------------------------------- #! Antrea CNI configuration #! --------------------------------------------------------------------- # ANTREA_NO_SNAT: false # ANTREA_TRAFFIC_ENCAP_MODE: "encap" # ANTREA_PROXY: false # ANTREA_POLICY: true # ANTREA_TRACEFLOW: false
Create a config file with name as wc-config.yaml in bootstrap machine with below content by providing or replacing the values wherever required.
workload config file
CLUSTER_NAME: capv-workload ## Optional CLUSTER_PLAN: prod ## Optional NAMESPACE: default CNI: antrea IDENTITY_MANAGEMENT_TYPE: none INFRASTRUCTURE_PROVIDER: azure #! --------------------------------------------------------------------- #! Node configuration #! --------------------------------------------------------------------- CONTROL_PLANE_MACHINE_COUNT: 3 WORKER_MACHINE_COUNT: 3 AZURE_CONTROL_PLANE_MACHINE_TYPE: "Standard_D2s_v3" AZURE_NODE_MACHINE_TYPE: "Standard_D2s_v3" #! --------------------------------------------------------------------- #! Azure Configuration #! --------------------------------------------------------------------- AZURE_ENVIRONMENT: "AzurePublicCloud" AZURE_TENANT_ID: <redacted> ## Provide TENANT_ID AZURE_SUBSCRIPTION_ID: <redacted> ## Provide SUBSCRIPTION_ID AZURE_CLIENT_ID: <redacted> ## Provide CLIENT_ID AZURE_CLIENT_SECRET: <redacted> ## CLIENT_SECRET AZURE_LOCATION: westus2 ## Optional AZURE_SSH_PUBLIC_KEY_B64: <redacted> ## Provide SSH_PUBLIC_KEY in encoded format AZURE_CONTROL_PLANE_SUBNET_NAME: "capv-workload-cp-A" ## To be changed if using a diff name AZURE_CONTROL_PLANE_SUBNET_CIDR: 172.17.0.0/24 ## To be changed if using diff CIDR AZURE_NODE_SUBNET_NAME: "capv-workload-worker-A" ## To be changed if using a diff name AZURE_NODE_SUBNET_CIDR: 172.17.1.0/24 ## To be changed if using diff CIDR AZURE_RESOURCE_GROUP: "capv-workload-RG" ## To be changed if using diff RG AZURE_VNET_RESOURCE_GROUP: "capv-workload-RG" ## To be changed if using diff RG AZURE_VNET_NAME: "capv-workload-vnet" ## To be changed if using a diff name AZURE_VNET_CIDR: 172.17.0.0/16 ## To be changed if using diff CIDR AZURE_ENABLE_PRIVATE_CLUSTER : "true" AZURE_FRONTEND_PRIVATE_IP : 172.17.0.15 ## To be changed to use diff IP # AZURE_ENABLE_ACCELERATED_NETWORKING : "" #! --------------------------------------------------------------------- #! Machine Health Check configuration #! --------------------------------------------------------------------- ENABLE_MHC: ENABLE_MHC_CONTROL_PLANE: true ENABLE_MHC_WORKER_NODE: true MHC_UNKNOWN_STATUS_TIMEOUT: 5m MHC_FALSE_STATUS_TIMEOUT: 12m MACHINE_HEALTH_CHECK_ENABLED: "true" #! --------------------------------------------------------------------- #! Common configuration #! --------------------------------------------------------------------- ENABLE_AUDIT_LOGGING: true ENABLE_DEFAULT_STORAGE_CLASS: true CLUSTER_CIDR: 100.96.0.0/11 SERVICE_CIDR: 100.64.0.0/13 #! --------------------------------------------------------------------- #! Autoscaler configuration #! --------------------------------------------------------------------- ENABLE_AUTOSCALER: false #! --------------------------------------------------------------------- #! Antrea CNI configuration #! --------------------------------------------------------------------- # ANTREA_NO_SNAT: false # ANTREA_TRAFFIC_ENCAP_MODE: "encap" # ANTREA_PROXY: false # ANTREA_POLICY: true # ANTREA_TRACEFLOW: false
Once below command is executed and cluster creation starts, keep and eye on resource group ( capv-workload-RG ) for resource – private DNS zone (capv-workload.capz.io). Once it is created, then create network links as shown below. This is required only once during first workload cluster deployment.
## Workload cluster create command
tanzu cluster create -f wc-config.yaml
Create Network Links for workload cluster
In Azure portal, Navigate to Resource groups > Workload cluster RG (capv-workload-RG) > Overview > Resources > Private DNS zone (capv-workload.capz.io) > Virtual network links > Add
Provide a Link name, In this case I have named it as workloadtomgmt
create deployment and expose with external load balancer service
## Get credentials tanzu cluster kubeconfig get capv-workload --admin Credentials of cluster 'capv-workload' have been saved You can now access the cluster by running 'kubectl config use-context capv-workload-admin@capv-workload'
# Change the context kubectl config use-context capv-workload-admin@capv-workload Switched to context "capv-workload-admin@capv-workload".
# List the contexts and make sure that current context (*) is pointing to capv-workload cluster as shown below:
kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE capv-mgmt-admin@capv-mgmt capv-mgmt capv-mgmt-admin * capv-workload-admin@capv-workload capv-workload capv-workload-admin
# Get the load balancer IP kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 4h51m spring-deploy LoadBalancer 100.68.113.144 20.99.166.127 8080:31592/TCP 82s
$ kubectl apply -f service-nginx.yaml deployment.apps/nginx-deployment created service/internal-svc-lb created
$ kubectl get deploy NAME READY UP-TO-DATE AVAILABLE AGE nginx-deployment 3/3 3 3 71s
$ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE internal-app LoadBalancer 100.71.2.186 172.17.1.7 80:30794/TCP 68s kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 97m
To verify the created Internal and External load balancers: In Azure portal > Navigate to Load balancers > capv-workload-internal-lb > Frontend IP configuration: