Using external-dns to Migrate Services between Kubernetes Clusters
🔀

Using external-dns to Migrate Services between Kubernetes Clusters

Using external-dns to migrate services between Kubernetes Clusters

Dec 16, 2021

Introduction

We live in a world of microservices that are intertwined together (kind of the point), usually within a complex network structure. If you utilise AWS and have your service deployed in EKS, you are most likely using AWS Load Balancers to expose said service, making it accessible to the entire world. Your domain name lives in Route53 and you want to map that to your ELB provisioned by an Ingress Controller in Kubernetes. In a non-k8s world, you would manage this from a tool or pipeline, likely utilising the AWS CLI that configures the domain in Route53, or worse, you would configure it manually.

What if you have hundreds of services and DNS records that need to be created and managed? What if the Load Balancer Endpoint changes? How do you keep track of hundreds of DNS records? Kubernetes allows us to utilise external-dns.

ExternalDNS allows you to control DNS records dynamically via Kubernetes resources in a DNS provider-agnostic way.

We can also utilise external-dns to migrate between clusters, which we will be covering in this post.

WorldRemit use case

We are constantly evaluating new tools and technologies at WorldRemit. Often, we are required to migrate services between the "old" and "new" EKS clusters.

We have always used external-dns for our services in EKS, so it made sense to attempt to utilise it again to "move" services to a new cluster.

⚠️

Note that we use AWS, but this should work on any cloud provider (with slight differences).

Each of our services has a unique hostname and is exposed privately and/or publicly in Route53.

We are utilising Kong as the Ingress Controller, but this should work with any Ingress Controller exposed via an AWS ELB.

Prerequisites

  • Kubernetes Cluster configured in AWS. We will assume both Clusters are in AWS for this example.
  • AWS Route53 zone created and configured. For this post, we will assume you are familiar with Route53 and have done this already.
  • An Ingress Controller configured in the Kubernetes Cluster. The good thing about external-dns is that it doesn't care about what this is, as long as it utilises Ingress or Service k8s objects. If you use an external service, see https://github.com/kubernetes-sigs/external-dns#deploying-to-a-cluster for more info.

Setup

There are multiple approaches that we have enabled in our environment:

  • Blue-Green
  • Canary

There are pros and cons to both approaches. The choice of the approach will depend on the teams and their requirements.

For context, all of the Infrastructure in WorldRemit is configured using Kustomize and applied to Kubernetes via a custom terraform provider and Jenkins.

Blue-Green uses a Simple Record, something that our services have had configured, so there is no need to change the record type, a huge bonus.

Canary uses a Weighted Record, and sadly, in Route53, you can't switch from one policy to another. You have to DELETE the existing record, causing downtime if any service caches the no-response in the time to create the new record. To semi-mitigate this, we have reduced the sync time with external-dns to a few seconds instead of the default 60 seconds.

For the main external-dns setup, follow the instructions for AWS.

Below are the specific args we want to configure to make sure our setup works as intended for this use-case, passing the values as env variables

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
spec:
  template:
    spec:
      containers:
        - name: external-dns
          args:
            - --source=service
            - --source=ingress
            - --provider=aws
            - --zone-id-filter=$(HOSTED_ZONE_ID_FILTER)
            - --policy=sync
            - --aws-zone-type=$(HOSTED_ZONE_TYPE)
            - --registry=txt
            - --txt-owner-id=$(CLUSTER_NAME)
            - --aws-batch-change-size=2
            - --interval=5s

Let's go through all of the args:

  • --source; the k8s objects we want to watch.
  • --provider; self-explanatory.
  • --zone-id-filter; we want to make sure that only the zone we want to change is modified.
  • --policy; modifies how DNS records are synchronised between sources and providers. The default is "sync" but, explicit is always better than implicit.
  • --aws-zone-type; private or public zone type.
  • --registry; create a TXT record alongside the ALIAS (A) Record. The TXT Record signifies that the corresponding ALIAS Record is managed by external-dns. This makes external-dns safer for running in environments where there are other records managed via other means.
  • --txt-owner-id; this is the main component we will take advantage of for our use case. Set this to a unique name for your cluster.
  • --aws-batch-change-size; tmp fix for the "Blocking Changes" mentioned below in Issues.
  • --interval; we are lowering the interval to update Route53 as soon as possible. The lower value is required to reduce downtime when switching between record types. The default is 1 minute. When changing this, we have to mind the Route53 API Request Limits.

We will now make the required service-specific configurations for external-dns. Luckily there are only annotations.

We will assume that you are already using an Ingress object in your "old" cluster and that the service is available in both locations. How you do this is up to you.

Blue-Green

To use a Blue-Green release method to switch over to the new cluster, you will have to configure your CICD to change the TXT record associated with the hostname extracted from the Ingress object. The TXT Record should have a similar value to

"heritage=external-dns,external-dns/owner=old-cluster,external-dns/resource=ingress/namespace-name/ingress-name"

We want to modify the owner field to say new-cluster instead of old-cluster.

Once we do this, the external-dns running in the new-cluster will see that it owns the record. However, the value of the associated A record isn't valid. It will update it to match the hostname in the Ingress object, switching over the traffic to the new cluster.

How you do this update is up to you. Ideally, you would have it hooked into your CICD as part of the application's deployment, running the required checks to make sure you are modifying the correct record, as well as if the record has changed.

Canary

When utilising Canary, we have to add a few additional annotations if you don't have them already.

Make sure the Ingress associated with your service in the current "old" cluster has the following annotations:

external-dns.alpha.kubernetes.io/hostname: <service-route53-hostname>
external-dns.alpha.kubernetes.io/alias: "true"
external-dns.alpha.kubernetes.io/set-identifier: <service>-<old-cluster>
external-dns.alpha.kubernetes.io/aws-weight: "50"

Consider the following when specifying the annotations:

  • hostname; self-explanatory.
  • alias; set to true, explicitly set alias targets to Ingress Load Balancers.
  • set-identifier; a unique name that differentiates among multiple resource record sets that have the same combination of name and type. As we will use Weighted Records, this is applicable because this will be creating two new A/TXT Records for this domain. For simplicity, suffix your service name with old-cluster and assign this as a value.
  • aws-weight; sets the proportion of DNS queries Route53 should respond to using this Record Set. All queries will route to this record set till you deploy the service on the second new cluster. There will be no difference to the service routing until the second Ingress is created in the new cluster.

The config in the cluster you wish to migrate to should include the same annotations, but with the set-identifier and aws-weigh modified

external-dns.alpha.kubernetes.io/hostname: <service-route53-hostname>
external-dns.alpha.kubernetes.io/alias: "true"
external-dns.alpha.kubernetes.io/set-identifier: <service>-<new-cluster>
external-dns.alpha.kubernetes.io/aws-weight: "5"
⚠️

Note that if you have a record already set off a different type, you will have to "manually" DELETE the previous record. I say manually in quotes because you can hook this up with your CICD and make the experience seamless for the Developers.

Tweaking the Weighted Routing

Choose a number between 0 and 255, bearing in mind that you will need to apportion the total weighting (up to 510 altogether) across the old and new clusters to achieve the desired proportion of traffic routed to either service. For example:

  • Assign "5" as a value in old-cluster for 9% of traffic to be routed to new-cluster
  • Assign "25" as a value in old-cluster for 33% of traffic to be routed to new-cluster
  • Assign "50" as a value in old-cluster for 50% of traffic to be routed to new-cluster
  • Assign "100" as a value in old-cluster for 66% of traffic to be routed to new-cluster
  • Assign "200" as a value in old-cluster for 80% of traffic to be routed to new-cluster
  • Assign "255" as a value in old-cluster for 85% of traffic to be routed to new-cluster

When comfortable, decrease the weighting in the old cluster and/or increase the weighting in the new cluster. Do this until the changes are satisfactory and the migrated service has 100% weighting in the new cluster.

⚠️

Setting a weight of 0 would stop requests going towards the associated cluster.

Issues

As always, no solution is perfect. There are some issues with Route53 and external-dns. Here are the ones we ran into:

Conclusion

Migrating services between clusters is always complex. You want to make sure that there is no-to minimal downtime. You don't want your customers don't have a negative experience.

external-dns and Route53 can fulfil such requirements if utilised correctly. Hopefully, this post helps you along the journey.

Photo by Jorge Tung on Unsplash

Blog Posts

Published
Certified Kubernetes Administrator (CKA) Notes
🧾
Certified Kubernetes Administrator (CKA) Notes

Linkedin Github Twitter Curriculum Vitae

Damir Dulic | Powered by Notion & Super