Deploy TKG (1.5.2) on Azure as Private cluster using existing VNet and NAT gateway

Reading Time: 11 mins

Overview

With Tanzu Kubernetes Grid, you can deploy Kubernetes clusters across software-defined datacenters (SDDC) and public cloud environments, including vSphere, Microsoft Azure, and Amazon EC2, providing organizations a consistent, upstream-compatible, regional Kubernetes substrate that is ready for end-user workloads and ecosystem integrations.

In this post, I will explain the detailed steps to deploy TKG cluster on Azure (version: 1.5.2) by using separate VNETs for Management and workload clusters with NAT Gateway. I have tried to put together the components used in this demo as a simple architecture and I hope this helps you to understand how each of them talk to each other.

Architecture

Prepare the setup

Create VNET, subnets and NAT gateway for management cluster

  • Login to Azure portal > Virtual networks > Create

Note: create new resource group or select an existing one, It is recommended to isolate the resources, so I choose to create new resource group(capv-mgmt-RG).

  • Click on Next: Ip Addresses
  • Provide IPv4 address space as shown below, followed by create subnets with min /28 CIDR for each

  • Review + Create > Create
  • In Azure portal, Navigate to NAT gateways > Create >

  • Click on Next: Outbound IP
  • create a new public IP address as below and click OK

  • Click on Next: Subnet
  • select the management vnet from drop down and select subnets created earlier.

Note: I do not want to use NAT gateway for bootstrap vm, so left it unchecked.

  • Review + Create > Create

Create Resource group, VNET, subnets and NAT gateway for workload cluster

  • Login to Azure portal > Virtual networks > Create

Note: create new resource group or select an existing one, It is recommended to isolate the resources, so I choose to create new resource group (capv-workload-RG)

  • Click on Next: Ip Addresses
  • Provide IPv4 address space as shown below, followed by create subnets with min /28 CIDR for each

  • Review + Create > Create
  • In Azure portal, Navigate to NAT gateways > Create >

  • Click on Next: Outbound IP
  • create a new public IP address as below and click OK

  • Click on Next: Subnet
  • select the workload vnet from drop down and select subnets created earlier.

  • Review + Create > Create

Configure VNET Peering

In Azure portal, Navigate to Virtual networks > management VNET (capv-mgmt-vnet) created earlier > Peerings > Add

  • This virtual network:
    • Peering link name: provide a name, In this case I have provided mgmtvnettoworkloadvnet
  • Remote virtual network
    • Peering link name: provide a name, In this case I have provided workloadvnettomgmtvnet
  • Virtual network: Select workload cluster (capv-workload-vnet)
  • Add

  • In Azure portal, Navigate to Virtual networks > management VNET (capv-mgmt-vnet) created earlier > Peerings

  • In Azure portal, Navigate to Virtual networks > workload VNET (capv-workload-vnet) created earlier > Peerings

Create NSG

Tanzu Kubernetes Grid management and workload clusters on Azure require two Network Security Groups (NSGs) to be defined on the cluster’s VNet and in its VNet resource group:

  • An NSG named <CLUSTER-NAME>-controlplane-nsg and associated with the cluster’s control plane subnet. For this demo, two NSG’s are created:
        •   capv-mgmt-controlplane-nsg: For management cluster control plane subnet
        •   capv-workload-controlplane-nsg: For Workload cluster control plane subnet
  • An NSG named <CLUSTER-NAME>-node-nsg and associated with the cluster worker node subnet
        • capv-mgmt-node-nsg: For management cluster worker node subnet
        • capv-workload-node-nsg: For workload cluster worker node subnet
NSG for management cluster control plane subnet
  • In Azure portal, Navigate to Network Security Groups > Create
  • Select management resource group (capv-mgmt-RG) from dropdown and name like: capv-mgmt-controlplane-nsg

  • Review + Create > Create
  • In Azure portal, Navigate to Virtual Networks > management vnet (capv-mgmt-vnet) > Subnets > management cluster control plane subnet ( capv-mgmt-cp-A ) > Network security group >  capv-mgmt-controlplane-nsg > Save

NSG for management cluster worker node subnet
  • In Azure portal, Navigate to Network Security Groups > Create
  • Select management resource group (capv-mgmt-RG) from dropdown and name like: capv-mgmt-node-nsg

  • Review + Create > Create
  • In Azure portal, Navigate to Virtual Networks > management vnet (capv-mgmt-vnet) > Subnets > management cluster workload subnet ( capv-mgmt-worker-A ) > Network security group >  capv-mgmt-node-nsg > Save

NSG for Workload cluster control plane subnet
  • In Azure portal, Navigate to Network Security Groups > Create
  • Select management resource group (capv-workload-RG) from dropdown and name like: capv-workload-controlplane-nsg

  • Review + Create > Create
  • In Azure portal, Navigate to Virtual Networks > workload vnet (capv-workload-vnet) > Subnets > workload cluster control plane subnet ( capv-workload-cp-A ) > Network security group >  capv-workload-controlplane-nsg > Save

NSG for workload cluster worker node subnet
  • In Azure portal, Navigate to Network Security Groups > Create
  • Select management resource group (capv-workload-RG) from dropdown and name like: capv-workload-node-nsg

  • Review + Create > Create
  • In Azure portal, Navigate to Virtual Networks > workload vnet (capv-workload-vnet) > Subnets > workload cluster worker node subnet ( capv-workload-worker-A ) > Network security group >  capv-workload-node-nsg > Save

Before starting with the deployment, it is important to ensure that the necessary pathways are open to all pieces of the clusters and that they are able to talk to one another.

Control Plane VMs/Subnet – HTTPS Inbound/Outbound to Internet and SSH and Secure Kubectl (22, 443, and 6443) Inbound/Outbound within the VNet

Ref from management cluster control plane subnet Inbound rules, same rules should be created for workload cluster control plane subnet as well

 

Ref for management cluster control plane subnet outbound rules, same rules should be created for workload cluster control plane subnet as well

Worker Node VMs/Subnet – Secure Kubectl (6443) Inbound/Outbound within the VNet

Ref for management cluster worker node subnet inbound rules, same rule should be created for workload cluster worker node subnet as well

 

Ref for management cluster worker node subnet outbound rules, same rule should be created for workload cluster worker node subnet as well

Deploy a boot strap machine and Install Docker, Carvel tools, Tanzu CLI and kubectl using script

  • Login to Azure portal > virtual machines > Create > Azure virtual machine > fill in the values as shown below:

  • Review + Create > Create
  • Once the bootstrap vm is deployed successfully, download tanzu cli (VMware Tanzu CLI for Linux) and kubectl (Kubectl cluster cli v1.22.5 for Linux) from vmware connect
  • Copy the downloaded files into bootstrap vm home directory (/home/azureuser)
  • Connect to vm and create a file on home directory (/home/azureuser) of bootstrap jumpbox with name as prepare-setup.sh
prepare-setup.sh
#!/bin/bash
echo "######### Installing Docker ############"
sudo apt-get update
sudo apt-get install  ca-certificates curl  gnupg  lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io -y
sudo usermod -aG docker $USER
mkdir $HOME/tanzu
cd $HOME/tanzu
cp $HOME/tanzu-cli-bundle-linux-amd64.tar.gz $HOME/tanzu
cp $HOME/kubectl-linux-v1.22.5+vmware.1.gz $HOME/tanzu
echo "################# Extracting the files ###################"
gunzip tanzu-cli-bundle-linux-amd64.tar.gz
tar -xvf tanzu-cli-bundle-linux-amd64.tar
gunzip kubectl-linux-v1.22.5+vmware.1.gz
cd $HOME/tanzu/cli
echo "################ Installing Tanzu CLI ###################"
sudo install core/v0.11.2/tanzu-core-linux_amd64 /usr/local/bin/tanzu
tanzu init
tanzu version
tanzu plugin sync
tanzu plugin list
#wget into a new folder and move the contents to Tanzu directory
cd ~/tanzu
chmod ugo+x kubectl-linux-v1.22.5+vmware.1
sudo install kubectl-linux-v1.22.5+vmware.1 /usr/local/bin/kubectl
cd $HOME/tanzu/cli
gunzip ytt-linux-amd64-v0.35.1+vmware.1.gz
gunzip kapp-linux-amd64-v0.42.0+vmware.1.gz
gunzip kbld-linux-amd64-v0.31.0+vmware.1.gz
gunzip imgpkg-linux-amd64-v0.18.0+vmware.1.gz
chmod ugo+x ytt-linux-amd64-v0.35.1+vmware.1
chmod ugo+x imgpkg-linux-amd64-v0.18.0+vmware.1
chmod ugo+x kapp-linux-amd64-v0.42.0+vmware.1
chmod ugo+x kbld-linux-amd64-v0.31.0+vmware.1
sudo mv ./ytt-linux-amd64-v0.35.1+vmware.1 /usr/local/bin/ytt
sudo mv ./kapp-linux-amd64-v0.42.0+vmware.1 /usr/local/bin/kapp
sudo mv ./kbld-linux-amd64-v0.31.0+vmware.1 /usr/local/bin/kbld
sudo mv ./imgpkg-linux-amd64-v0.18.0+vmware.1 /usr/local/bin/imgpkg
echo "################# Verify Tanzu CLI version ###################"
tanzu version
echo "################# Verify Kubectl version ###################"
kubectl version
echo "################# Verify imgpkg version ###################"
imgpkg --version
echo "################# Verify kapp version ###################"
kapp --version
echo "################# Verify kbld version  ###################"
kbld --version
echo "################# Verify ytt version  ###################"
ytt --version
echo "reboot bootstrap JB ###"
sudo reboot
Commands to execute
## Make the script (prepare-setup.sh) executable : 

chmod +x prepare-setup.sh

## Run the script

./prepare-setup.sh

Note: In the end, JB gets rebooted. so once the vm is up, reconnect again. 

Create service principal in Azure

  • Login to Azure portal > Azure Active Directory > App registrations > New registration – Give a Name

  • Click on newly cleared application (service principal) and copy below req info in notepad, this will be used while creating management cluster:
    • Application (client) ID
    • Subscription ID

  • Navigate to Subscriptions > IAM > Add role assignment > Contributor > Next > + Select members > search for application created earlier > Select > Next > Review + assign

  • Navigate to Azure Active Directory > App registrations > click on application created earlier > Certificates & secrets > + New client secret > give a description > Add
  • Copy the value and save in notepad, this the CLIENT_SECRET

 

Download and Install Azure CLI in boot strap machine:

Click here to find the steps to install azure cli in boot strap machine.

Accept the Base Image License:

# Sign in to the Azure CLI with your tkg service principal.

az login --service-principal --username AZURE_CLIENT_ID --password AZURE_CLIENT_SECRET --tenant AZURE_TENANT_ID

# where AZURE_CLIENT_ID, AZURE_CLIENT_SECRET and AZURE_TENANT_ID values are collected earlier and saved into notepad. 

az vm image terms accept --publisher vmware-inc --offer tkg-capi --plan k8s-1dot22dot5-ubuntu-2004 --subscription <subscription id collected earler>

# Ex: az vm image terms accept --publisher vmware-inc --offer tkg-capi --plan k8s-1dot22dot5-ubuntu-2004 --subscription <subscription ID>

Create new key pair:

To connect to Azure TKG vm’s (management cluster or workload vm’s), the bootstrap machine must provide the public key part of an SSH key pair. If your bootstrap machine does not already have an SSH key pair, you can use a tool such as ssh-keygen to generate one.

# On your bootstrap machine, run the following ssh-keygen command.

ssh-keygen -t rsa -b 4096 -C "email@example.com"

# At the prompt Enter file in which to save the key (/root/.ssh/id_rsa): press Enter to accept the default.

#Enter and repeat a password for the key pair.

#Add the private key to the SSH agent running on your machine, and enter the password you created in the previous step.

ssh-add ~/.ssh/id_rsa

Create Management cluster

Create Management cluster using config file

  • Create a config file in bootstrap machine ( name as mgmt-clusterconfig.yaml ) with below content by providing or replacing the values wherever required.
mgmt-clusterconfig.yaml
CLUSTER_NAME: capv-mgmt          ## Optional
CLUSTER_PLAN: prod               ## Optional
NAMESPACE: default
CNI: antrea
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: azure
#! ---------------------------------------------------------------------
#! Node configuration
#! ---------------------------------------------------------------------
CONTROL_PLANE_MACHINE_COUNT: 3
WORKER_MACHINE_COUNT: 3
AZURE_CONTROL_PLANE_MACHINE_TYPE: "Standard_D2s_v3"
AZURE_NODE_MACHINE_TYPE: "Standard_D2s_v3"
#! ---------------------------------------------------------------------
#! Azure Configuration
#! ---------------------------------------------------------------------
AZURE_ENVIRONMENT: "AzurePublicCloud"
AZURE_TENANT_ID: <redacted>                   ## Provide TENANT_ID
AZURE_SUBSCRIPTION_ID: <redacted>			  ## Provide SUBSCRIPTION_ID
AZURE_CLIENT_ID: <redacted>                  ## Provide CLIENT_ID
AZURE_CLIENT_SECRET: <redacted>               ## Provide CLIENT_SECRET
AZURE_LOCATION: westus2
AZURE_SSH_PUBLIC_KEY_B64: <redacted>          ## Provide SSH_PUBLIC_KEY in encoded format
AZURE_CONTROL_PLANE_SUBNET_NAME: "capv-mgmt-cp-A"  ## To be changed if using a diff name
AZURE_CONTROL_PLANE_SUBNET_CIDR: 192.168.1.0/24    ## To be changed if using a diff CIDR
AZURE_NODE_SUBNET_NAME: "capv-mgmt-worker-A"       ## To be changed if using a diff name
AZURE_NODE_SUBNET_CIDR: 192.168.2.0/24             ## To be changed if using a diff CIDR
AZURE_RESOURCE_GROUP: "capv-mgmt-RG"               ## To be changed if using a diff RG
AZURE_VNET_RESOURCE_GROUP: "capv-mgmt-RG"          ## To be changed if using a diff RG
AZURE_VNET_NAME: "capv-mgmt-vnet"                  ## To be changed if using a diff name
AZURE_VNET_CIDR: 192.168.0.0/16                    ## To be changed if using a diff CIDR
AZURE_ENABLE_PRIVATE_CLUSTER : "true"
AZURE_FRONTEND_PRIVATE_IP : 192.168.1.15           ## To be changed to use diff IP
# AZURE_ENABLE_ACCELERATED_NETWORKING : ""
#! ---------------------------------------------------------------------
#! Machine Health Check configuration
#! ---------------------------------------------------------------------
ENABLE_MHC:
ENABLE_MHC_CONTROL_PLANE: true
ENABLE_MHC_WORKER_NODE: true
MHC_UNKNOWN_STATUS_TIMEOUT: 5m
MHC_FALSE_STATUS_TIMEOUT: 12m
MACHINE_HEALTH_CHECK_ENABLED: "true"
#! ---------------------------------------------------------------------
#! Common configuration
#! ---------------------------------------------------------------------
ENABLE_AUDIT_LOGGING: true
ENABLE_DEFAULT_STORAGE_CLASS: true
CLUSTER_CIDR: 100.96.0.0/11
SERVICE_CIDR: 100.64.0.0/13
#! ---------------------------------------------------------------------
#! Autoscaler configuration
#! ---------------------------------------------------------------------
ENABLE_AUTOSCALER: false
#! ---------------------------------------------------------------------
#! Antrea CNI configuration
#! ---------------------------------------------------------------------
# ANTREA_NO_SNAT: false
# ANTREA_TRAFFIC_ENCAP_MODE: "encap"
# ANTREA_PROXY: false
# ANTREA_POLICY: true
# ANTREA_TRACEFLOW: false
Create Management cluster
## Management cluster creation command:

tanzu management-cluster create -f mgmt-clusterconfig.yaml
  • Once management cluster is created successfully

  • VM deployed in Azure portal with no public IP.

Create Network Links

In Azure portal, Navigate to Resource groups > Management cluster RG (capv-mgmt-RG) > Overview > Resources > Private DNS zone (capv-mgmt.capz.io) > Virtual network links > Add

  • Provide a Link name, In this case I have named it as mgmtdnstoworkload
  • Virtual network: Select workload vnet (capv-workload-vnet)

Create Workload cluster

Create workload cluster using config file

  • Create a config file with name as wc-config.yaml in bootstrap machine with below content by providing or replacing the values wherever required.
workload config file
CLUSTER_NAME: capv-workload      ## Optional
CLUSTER_PLAN: prod                       ## Optional
NAMESPACE: default
CNI: antrea
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: azure
#! ---------------------------------------------------------------------
#! Node configuration
#! ---------------------------------------------------------------------
CONTROL_PLANE_MACHINE_COUNT: 3
WORKER_MACHINE_COUNT: 3
AZURE_CONTROL_PLANE_MACHINE_TYPE: "Standard_D2s_v3"
AZURE_NODE_MACHINE_TYPE: "Standard_D2s_v3"
#! ---------------------------------------------------------------------
#! Azure Configuration
#! ---------------------------------------------------------------------
AZURE_ENVIRONMENT: "AzurePublicCloud"
AZURE_TENANT_ID: <redacted>                    ## Provide TENANT_ID
AZURE_SUBSCRIPTION_ID: <redacted>             ## Provide SUBSCRIPTION_ID
AZURE_CLIENT_ID: <redacted>                    ## Provide CLIENT_ID
AZURE_CLIENT_SECRET: <redacted>                ## CLIENT_SECRET
AZURE_LOCATION: westus2                        ## Optional
AZURE_SSH_PUBLIC_KEY_B64: <redacted>            ## Provide SSH_PUBLIC_KEY in encoded format
AZURE_CONTROL_PLANE_SUBNET_NAME: "capv-workload-cp-A"   ## To be changed if using a diff name
AZURE_CONTROL_PLANE_SUBNET_CIDR: 172.17.0.0/24          ## To be changed if using diff CIDR
AZURE_NODE_SUBNET_NAME: "capv-workload-worker-A"        ## To be changed if using a diff name
AZURE_NODE_SUBNET_CIDR: 172.17.1.0/24                   ## To be changed if using diff CIDR
AZURE_RESOURCE_GROUP: "capv-workload-RG"                ## To be changed if using diff RG
AZURE_VNET_RESOURCE_GROUP: "capv-workload-RG"           ## To be changed if using diff RG
AZURE_VNET_NAME: "capv-workload-vnet"                   ## To be changed if using a diff name
AZURE_VNET_CIDR: 172.17.0.0/16                          ## To be changed if using diff CIDR
AZURE_ENABLE_PRIVATE_CLUSTER : "true"
AZURE_FRONTEND_PRIVATE_IP : 172.17.0.15                 ## To be changed to use diff IP
# AZURE_ENABLE_ACCELERATED_NETWORKING : ""
#! ---------------------------------------------------------------------
#! Machine Health Check configuration
#! ---------------------------------------------------------------------
ENABLE_MHC:
ENABLE_MHC_CONTROL_PLANE: true
ENABLE_MHC_WORKER_NODE: true
MHC_UNKNOWN_STATUS_TIMEOUT: 5m
MHC_FALSE_STATUS_TIMEOUT: 12m
MACHINE_HEALTH_CHECK_ENABLED: "true"
#! ---------------------------------------------------------------------
#! Common configuration
#! ---------------------------------------------------------------------
ENABLE_AUDIT_LOGGING: true
ENABLE_DEFAULT_STORAGE_CLASS: true
CLUSTER_CIDR: 100.96.0.0/11
SERVICE_CIDR: 100.64.0.0/13
#! ---------------------------------------------------------------------
#! Autoscaler configuration
#! ---------------------------------------------------------------------
ENABLE_AUTOSCALER: false
#! ---------------------------------------------------------------------
#! Antrea CNI configuration
#! ---------------------------------------------------------------------
# ANTREA_NO_SNAT: false
# ANTREA_TRAFFIC_ENCAP_MODE: "encap"
# ANTREA_PROXY: false
# ANTREA_POLICY: true
# ANTREA_TRACEFLOW: false
  • Once below command is executed and cluster creation starts, keep and eye on resource group ( capv-workload-RG ) for resource – private DNS zone (capv-workload.capz.io). Once it is created, then create network links as shown below. This is required only once during first workload cluster deployment.
## Workload cluster create command

tanzu cluster create -f wc-config.yaml

Create Network Links for workload cluster

In Azure portal, Navigate to Resource groups > Workload cluster RG (capv-workload-RG) > Overview > Resources > Private DNS zone (capv-workload.capz.io) > Virtual network links > Add

  • Provide a Link name, In this case I have named it as workloadtomgmt
  • Virtual network: Select management vnet (capv-mgmt-vnet)

Validate

Deploy an application with external load balancer

create deployment and expose with external load balancer service
## Get credentials
tanzu cluster kubeconfig get capv-workload --admin
Credentials of cluster 'capv-workload' have been saved
You can now access the cluster by running 'kubectl config use-context capv-workload-admin@capv-workload'

# Change the context
kubectl config use-context capv-workload-admin@capv-workload
Switched to context "capv-workload-admin@capv-workload".

# List the contexts and make sure that current context (*) is pointing to capv-workload cluster as shown below: 

kubectl config get-contexts
CURRENT   NAME                                CLUSTER         AUTHINFO              NAMESPACE
          capv-mgmt-admin@capv-mgmt           capv-mgmt       capv-mgmt-admin
*         capv-workload-admin@capv-workload   capv-workload   capv-workload-admin

## Create a deployment
kubectl create deployment spring-deploy --port=8080 --image=eknath009/tbs-spring-image:3 --replicas=2

# Get the pods
kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE   IP           NODE                       NOMINATED NODE   READINESS GATES
spring-deploy-844c6c7688-mqq4l   1/1     Running   0          25s   100.96.1.8   capv-workload-md-0-67vnv   <none>           <none>
spring-deploy-844c6c7688-qxs8t   1/1     Running   0          25s   100.96.1.9   capv-workload-md-0-67vnv   <none>           <none>

# Expose the deployment
kubectl expose deployment spring-deploy --port=8080 --type=LoadBalancer
service/spring-deploy exposed

# Get the load balancer IP
kubectl get svc
NAME            TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE
kubernetes      ClusterIP      100.64.0.1       <none>          443/TCP          4h51m
spring-deploy   LoadBalancer   100.68.113.144   20.99.166.127   8080:31592/TCP   82s
  • Access the load balancer IP in web browser:

Deploy an application with Internal load balancer

service-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: internal-svc-lb
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  ports:
  - port: 80
  selector:
    app: nginx
$ kubectl apply -f service-nginx.yaml
deployment.apps/nginx-deployment created
service/internal-svc-lb created

$ kubectl get deploy
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3/3     3            3           71s

$ kubectl get svc
NAME              TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
internal-app   LoadBalancer   100.71.2.186   172.17.1.7    80:30794/TCP   68s
kubernetes        ClusterIP      100.64.0.1     <none>        443/TCP        97m

  • To verify the created Internal and External load balancers: In Azure portal > Navigate to Load balancers > capv-workload-internal-lb > Frontend IP configuration:

 

Thanks for reading..