One thing that makes Kubernetes great is its ecosystem of tools and utilities available to satisfy the many requirements of running and exposing services. Beyond the core Kubernetes capabilities of running containers, abstracting them behind services, and publishing them using ingresses, the open source ingress-nginx, cert-manager, and external-dns projects provide ingress functionality, TLS certificates, and maintain DNS records, respectively, for the services you run in Kubernetes. In other words, these four open source projects, Kubernetes, ingress-nginx, cert-manager, and external-dns, provide a complete solution for securely making your services available. This article will detail how to set up these projects to work together, using a Google Kubernetes Engine (GKE) cluster with workload identity and Google Cloud DNS for specifics.

This guide creates a production-ready Kubernetes solution, following best practices for security, scalability, and reliability. As such, its architecture is more complex than required for a simple, exploratory environment. If you are looking for a “toy” environment to play around with, I would recommend the documentation sites for ingress-nginx, cert-manager, and external-dns, or various other online tutorials.

Grab a cup of coffee and get comfortable, we’re going to dig deep into the details.

The Kubernetes cluster

Since we plan on using GKE workload identity, we’ll go through the complete process of creating projects and the GKE cluster. If you already have a Google Cloud Platform (GCP) project and GKE cluster configuration you are happy with, you can skip this section. If you want to learn how to architect a production-ready GKE solution, read on.

You will need a Google Cloud Platform account with an active billing account to create the required resources. Creating these resources will cost money. You will also need the gcloud and kubectl command-line utilities installed on your system.

GCP projects

We will create three separate GCP projects, one for the Kubernetes cluster, one for DNS, and one for the key management service (KMS) securing Kubernetes secrets. In the first command below, set the base variable to something unique, as GCP projects must have globally unique identifiers.

$ base=blog-k8s
$ env=production
$ user=$(gcloud config get-value account)
$ labels="env=$env,purpose=$base,user=${user%@*}"
$ for ext in cluster dns kms; do \
    gcloud projects create "$base-$ext" --labels="$labels" \
      --name="Kubernetes $ext" --no-enable-cloud-apis; \
  done
The project ID must be 30 characters or less, so do not make your base longer than 22 characters.

With the projects created, we need to link the projects to an existing billing account. You can set the shell variable billing to your GCP billing account ID, or run the following command to use the first active billing account in your GCP account.

$ billing=$(gcloud beta billing accounts list \
    --format='value(name)' --filter=open=true | head -n 1)

After setting the billing variable, you can link each project with the billing account.

$ for ext in cluster dns kms; do \
    gcloud beta billing projects link "$base-$ext" --billing-account="$billing"; \
  done

Finally, enable the necessary GCP APIs in each project.

$ gcloud services enable --project="$base-cluster" {container,iamcredentials}.googleapis.com
$ gcloud services enable --project="$base-dns" dns.googleapis.com    
$ gcloud services enable --project="$base-kms" cloudkms.googleapis.com

Service accounts

We will need several GCP service accounts to provide our Kubernetes cluster nodes and workloads with the credentials they need to access GCP APIs. All of these service accounts will be created in the cluster project. The node service account needs access to write logs and write and view metrics in the cluster project. We want to use wildcard certificates so cert-manager must be configured to use DNS-01 challenges to verify ownership. Therefore the cert-manager service account will need the ability to create DNS records in the DNS project. The external-dns service account will also need the ability to manage DNS records in the DNS project.

First, we will create each service account and capture its email address in a shell variable.

$ node_sa_name="Kubernetes $base node"
$ gcloud iam service-accounts create "sa-node-$base" \
    --display-name="$node_sa_name" --project="$base-cluster"
$ node_sa_email=$(gcloud iam service-accounts list --project="$base-cluster" \
    --format='value(email)' --filter="displayName:$node_sa_name")
$ cert_sa_name="Kubernetes $base cert-mananger"
$ gcloud iam service-accounts create "sa-cert-$base" \
    --display-name="$cert_sa_name" --project="$base-cluster"
$ cert_sa_email=$(gcloud iam service-accounts list --project="$base-cluster" \
    --format='value(email)' --filter="displayName:$cert_sa_name")
$ edns_sa_name="Kubernetes $base external-dns"
$ gcloud iam service-accounts create "sa-edns-$base" \
    --display-name="$edns_sa_name" --project="$base-cluster"
$ edns_sa_email=$(gcloud iam service-accounts list --project="$base-cluster" \
    --format='value(email)' --filter="displayName:$edns_sa_name")

After creating the service accounts, we bind the needed IAM roles from the appropriate project to each service account. First, we bind the logging and metrics roles to the node service account.

$ for role in monitoring.metricWriter monitoring.viewer logging.logWriter; do \
    gcloud projects add-iam-policy-binding "$base-cluster" \
      --member="serviceAccount:$node_sa_email" --role="roles/$role"; \
  done

Next we bind the DNS admin role in the DNS project to the cert-manager and external-dns service accounts.

$ for sa_email in "$cert_sa_email" "$edns_sa_email"; do \
    gcloud projects add-iam-policy-binding "$base-dns" \
      --member="serviceAccount:$sa_email" --role=roles/dns.admin; \
  done

Last, we bind the cert-manager and external-dns GCP service accounts to their respective Kubernetes workload, i.e., link the GCP service account and the Kubernetes service account in GCP.

$ gcloud iam service-accounts add-iam-policy-binding "$cert_sa_email" \
    --member="serviceAccount:$base-cluster.svc.id.goog[cert-manager/cert-manager]" \
    --role=roles/iam.workloadIdentityUser --project=$base-cluster
$ gcloud iam service-accounts add-iam-policy-binding "$edns_sa_email" \
    --member="serviceAccount:$base-cluster.svc.id.goog[external-dns/external-dns]" \
    --role=roles/iam.workloadIdentityUser --project=$base-cluster

DNS

There are two steps to creating a valid DNS zone: creating the zone and registering the domain and nameserver. We create the zone in Google Cloud DNS with the following commands, setting the value of domain to the fully-qualified name of your DNS zone.

You should either already own this domain or be able to purchase it. This typically costs money. To avoid the time and money required when registering a domain, you can use a subdomain of a domain you already have registered. For example, if you own my.com, you can use k8s.my.com without having to register a new domain.
$ domain=k8s.atm.io.
$ zon=${domain%.}; zone=${zon//./-}
$ gcloud dns managed-zones create "$zone" --dns-name="$domain" \
    --description="$base $domain DNS" --dnssec-state=on --visibility=public \
    --labels="$labels" --project="$base-dns"

What you need to do next depends on your domain and domain registrar. If you have not registered the domain you are using, you will need to do that. Once you have a registered domain, you need to configure DNS for the domain. This command will output the nameservers for the domain.

$ gcloud dns managed-zones describe "$zone" \
    --format='value(nameServers)' --project="$base-dns"

If your domain is registered with a registrar, you will provide the nameservers to the registrar for the domain. If you are using a subdomain of a domain you have registered, you will create NS records for your subdomain in the DNS zone for the domain. Check with the registrar/DNS provider for your domain to see how to add the NS records.

The command below will output the DS record, i.e., DNSSEC configuration, you can provide your registrar.

$ ksk=$(gcloud dns dns-keys list --zone=$zone --project="$base-dns" \
    --filter=type=keySigning --format='value(id)' | head -n 1)
$ gcloud dns dns-keys describe "$ksk" --zone="$zone" \
     --format='value(ds_record())' --project="$base-dns"

Again, check with the registrar/DNS provider for your domain to see how to add this record.

KMS

We will use Google Cloud KMS to encrypt Kubernetes secrets in the Kubernetes etcd database. First, we create a key ring and the key encryption key in the KMS project.

$ region=us-central1
$ gcloud kms keyrings create "ring-$base" --location="$region" --project="$base-kms"
$ gcloud kms keys create "key-$base" --keyring="ring-$base" --purpose=encryption \
    --labels="$labels" --location="$region" --project="$base-kms"

We next give the GKE service account in the cluster project access to the newly created key.

$ project_id=$(gcloud projects describe "$base-cluster" --format='value(projectNumber)')
$ gke_sa=service-$project_id@container-engine-robot.iam.gserviceaccount.com
$ gcloud kms keys add-iam-policy-binding "key-$base" --keyring="ring-$base" \
    --member="serviceAccount:$gke_sa" --role=roles/cloudkms.cryptoKeyEncrypterDecrypter \
    --location="$region" --project="$base-kms"

VPC

We will create a dedicated virtual private cloud (VPC) network for the Kubernetes cluster, isolating it from both the Internet and any other resources in the project.

$ gcloud compute networks create "net-$base" \
    --description="Kubernetes network $base" \
    --subnet-mode=custom --project="$base-cluster"

We next create the subnet for our cluster nodes, pods, and services.

$ gcloud compute networks subnets create "subnet-$base" \
    --network="net-$base" --range=10.0.0.0/22 \
    --description="Kubernetes subnet $base" \
    --enable-private-ip-google-access --purpose=PRIVATE \
    --region="$region" --project="$base-cluster" \
    --secondary-range="svc=10.0.16.0/20,pod=10.12.0.0/14"

Since we are going to create a GKE cluster with private nodes, we must create a NAT for our subnet so our workloads can access the Internet. Even if your workloads do not need Internet access, the cluster nodes will need it to download the Docker images for cert-manager and ingress-nginx. Before creating the NAT, we create its router with the following command.

$ gcloud compute routers create "router-$base" --network="net-$base" \
    --description="NAT router" --region="$region" --project="$base-cluster"

We complete the VPC setup by creating the NAT.

$ gcloud compute routers nats create "nat-$base" --router="router-$base" \
    --auto-allocate-nat-external-ips --region="$region" \
    --nat-custom-subnet-ip-ranges="subnet-$base,subnet-$base:svc,subnet-$base:pod" \
    --project="$base-cluster"

GKE Cluster

With all the supporting infrastructure in place, we can now create the Kubernetes cluster. We will create a regional, private GKE cluster leveraging master authorized networks, auto scaling, auto repair, auto upgrade with the regular release channel, workload identity, network policies, and shielded nodes running the Container-Optimized OS and using the containerd runtime. We do not enable the HttpLoadBalancing add-on because we are using ingress-nginx.

$ key_id=projects/$base-kms/locations/$region/keyRings/ring-$base/cryptoKeys/key-$base
$ mcidr=172.19.13.32/28
$ gcloud beta container clusters create "gke-$base" \
    --enable-autorepair --enable-autoupgrade \
    --metadata disable-legacy-endpoints=true \
    --labels="$labels" --node-labels="$labels" \
    --tags="kubernetes-worker,$base,$env,${user%@*}" \
    --enable-autoscaling --service-account="$node_sa_email" \
    --workload-metadata-from-node=GKE_METADATA_SERVER \
    --shielded-integrity-monitoring --shielded-secure-boot \
    --addons=HorizontalPodAutoscaling,NetworkPolicy,NodeLocalDNS \
    --database-encryption-key="$key_id" --no-enable-basic-auth \
    --enable-ip-alias --no-enable-legacy-authorization \
    --enable-network-policy --enable-shielded-nodes \
    --enable-stackdriver-kubernetes \
    --identity-namespace="$base-cluster.svc.id.goog" \
    --image-type=COS_CONTAINERD --no-issue-client-certificate \
    --machine-type=e2-standard-2 --max-nodes=3 --min-nodes=1 \
    --network="net-$base" --subnetwork="subnet-$base" \
    --release-channel=regular --enable-master-authorized-networks \
    --master-authorized-networks="$(curl -s https://icanhazip.com/)/32" \
    --enable-private-nodes --master-ipv4-cidr="$mcidr" \
    --maintenance-window-start=2000-01-01T22:00:00Z \
    --maintenance-window-end=2000-01-02T05:00:00Z \
    --maintenance-window-recurrence='FREQ=WEEKLY;BYDAY=SA,SU' \
    --region="$region" --project="$base-cluster"

The above command will take several minutes to complete. It configures a single master authorized network containing only the external IP address of the machine running the command, $(curl https://icanhazip.com/)/32. If you will need to access the Kubernetes API, e.g., using kubectl, from other systems, you can add additional networks by adding additional, comma-delimited CIDR blocks to the --master-authorized-networks command-line option argument. See the cluster creation man page for more information on this and other options.

Next, get the Kubernetes cluster credentials and configure your local clients, including kubectl, to use them.

$ gcloud container clusters get-credentials "gke-$base" \
    --region="$region" --project="$base-cluster"

By default, GCP will not give you administrative access to the cluster you just created. Run the following command to give your GCP user administrative access to the GKE cluster.

$ kubectl create clusterrolebinding "cluster-admin-${user%@*}" \
    --clusterrole=cluster-admin --user="$user"

ingress-nginx

We will follow the standard installation procedure for ingress-nginx on GKE, with a couple tweaks. Before deploying ingress-nginx, we will create a GCP external IP address. This will allow the ingress-nginx controller service’s load balancer, and hence our services, to have a stable IP address across upgrades, migrations, etc.

$ gcloud compute addresses create "ip-nginx-$base" \
    --description="ingress-nginx service load balancer IP" \
    --network-tier=PREMIUM --region="$region" --project="$base-cluster"
$ ip=$(gcloud compute addresses describe "ip-nginx-$base" \
    --format='value(address)' --region="$region" --project="$base-cluster")

Recent versions of ingress-nginx include a validating webhook endpoint. This webhook registers itself with the Kubernetes API to validate all ingress resource specifications before they are used to create or update an ingress. This endpoint listens on port 8443 and must be accessible from the Kubernetes API server. Since the default firewall rules do not allow access from the API server to the nodes on port 8443, we add a firewall rule allowing it.

$ gcloud compute firewall-rules create "fw-nginx-$base" \
    --allow=tcp:8443 --description="ingress-nginx webhook" \
    --direction=INGRESS --network="net-$base" --priority=1000 \
    --source-ranges="$mcidr" --target-service-accounts="$node_sa_email" \
    --project="$base-cluster"

To create the ingress-nginx resources, we can use its standard YAML specifications, with one modification: explicitly setting the IP address of the ingress-nginx-controller load balancer service to the address we created above. We’ll use the yq command-line utility and some awk to add the loadBalancerIP property and then use kubectl to create the resources.

$ in_yaml=ingress-nginx.yaml
$ curl -sLo "$in_yaml" https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.34.1/deploy/static/provider/cloud/deploy.yaml
$ yq w \
    -d$(awk '/^kind:/ { kind = $2; d++ } \
      /^  name: ingress-nginx-controller$/ { if (kind == "Service") { print d-1; exit } }' "$in_yaml") \
    "$in_yaml" 'spec.loadBalancerIP' "$ip" | kubectl apply -f -
$ rm "$in_yaml"

cert-manager

Similar to ingress-nginx, we can use the standard installation instructions for cert-manager, with one addition: adding the workload identity annotation to the cert-manager service account. This is final step in linking the Kubernetes service account cert-manager runs as with the GCP service account we bound to the DNS admin role in the DNS project above, i.e., linking the GCP service account and the Kubernetes service account in GKE.

$ cm_yaml=cert-manager.yaml
$ curl -sLo "$cm_yaml" \
    https://github.com/jetstack/cert-manager/releases/download/v0.16.1/cert-manager.yaml
$ yq w \
    -d$(awk '/^kind:/ { kind = $2; d++ } \
      /^  name: cert-manager$/ { if (kind == "ServiceAccount") { print d-1; exit } }' "$cm_yaml") \
    "$cm_yaml" 'metadata.annotations."iam.gke.io/gcp-service-account"' "$cert_sa_email" | \
    kubectl apply -f -
$ rm "$cm_yaml"

external-dns

external-dns does not provide a standard set of resource specifications for use when deploying it to a Kubernetes cluster using kubectl. Fortunately, it does provide example resource specifications for deploying external-dns using Google Cloud DNS and ingress-nginx that we can modify for our purposes. Starting with the example, we’ll add an external-dns namespace, add the workload identity annotation to the service account, and update the container args with values for our GCP project and domain. We've made a suitably modified set of resource specifications available on GitHub, so we’ll use that as a starter and then modify it with our GCP DNS project, domain, and external-dns GCP service account email address.

$ curl -sL https://raw.githubusercontent.com/atomist-blogs/iac-gke/main/k8s/external-dns.yaml | \
    sed -e "/gcp-service-account:/s/:.*/: $edns_sa_email/" \
      -e "/domain-filter=/s/=.*/=${domain%.}/" \
      -e "/google-project=/s/=.*/=$base-dns/" \
      -e "/txt-owner-id=/s/=.*/=$base/" | kubectl apply -f -

Discussion

This may seem like a lot of work just to run some services, but such is the nature of making a reliable, secure service available. The complexity largely arises from two sources: inherent complexity and automation. Managing SSL/TLS certificates has always been a hassle. Dynamic cloud infrastructure introduces more complexity in managing DNS records. Let’s Encrypt and cloud DNS services like AWS Route 53 and Google Cloud DNS have greatly improved the situation by putting APIs in front of the functionality, but there is still complexity. Automating aspects of development and operations is and always will be more complex than just doing them manually… the first time. What has been shown time and time again is that while automating is more work initially, it pays off in the long term. If you only manage one Kubernetes cluster, then perhaps clicking through a web console to create and manage the resources will work for you. This is not typical. Typically you will be managing multiple clusters with varying requirements and automating their management will be crucial. Don’t be afraid of all the commands and code above, the secret is that they demonstrate a repeatable, scriptable solution that you can stamp out again and again (more on that below).

You may wonder why, if using GKE, we deploy ingress-nginx instead of using the native HTTP load balancer functionality of GKE. The answer is cost. Using nginx-ingress requires only a single load balancer, whereas the native GKE load balancer solution creates a load balancer for every ingress resource. With any sizable number of deployments, the cost of those load balancers can add up quickly. Taking this route does introduce additional complexity. We have to deploy our own ingress controller and something to manage TLS certificates, since if we are not using GKE load balancing, we cannot use GCP certificates. The advantage, other than cost, is that we have a more portable Kubernetes solution, relying less on GKE-specific functionality and more on open source solutions that can be used in any Kubernetes cluster. The cost of the resources created in this tutorial will vary depending on usage, but will be at least $150 per month: $72 for the GKE cluster, $56 for the three e2-standard-2 instances, $18 for the load balancer, $3 for the external IP address, $1 for KMS, with possible additional costs for network egress and DNS.

Now that you have all the infrastructure set up, what’s next? There are many tutorials available detailing deploying applications to Kubernetes, configuring ingress resources for use with ingress-nginx, creating TLS certificates with cert-manager, and using external-dns. The focus of this post is on getting the infrastructure set up, everything beyond that should be standard Kubernetes fare, and there are a ton of resources available for that.

As stated above, the intent of these instructions are to provide a production-ready cluster. Nonetheless, if you wish to destroy this beautiful thing we have created, the easiest way to do it is to delete the projects:

$ for ext in cluster dns kms; do \
    gcloud projects delete "$base-$ext"; \
  done

Deleting a project disables all resources in the project, so charges will stop accruing, and schedules everything for permanent removal in about 30 days.

Phew!

That was a lot to take in but now you have a production-ready Kubernetes cluster capable of securely exposing your services. I hope you better understand Kubernetes, GKE, ingress-nginx, cert-manager, external-dns, workload identity, and how they all work together to help you make your workloads securely available on the Internet. If after going through all of that you think it is crazy to create all those resources manually, I agree. Here is an infrastructure-as-code repository leveraging Pulumi so you can manage all these resources with a single command: atomist-blogs/iac-gke. Once you have your resources up and running in Kubernetes, be sure to monitor them with Atomist. If you have any questions or suggestions for improvement, don't hesitate to contact me. I'm @dd in both the Kubernetes and GCP Slack workspaces, or you can create an issue in atomist-blogs/iac-gke. Thanks for reading this post to the end!