azure, kubernetes, linux, spinnaker comments edit

Kayenta is the subcomponent of Spinnaker that handles automated canary analysis during a deployment. It reads from your metric sources and compares the stats from an existing deployed service against a new version of the service to see if there are anomalies or problems, indicating the rollout should be aborted if the new service fails to meet specified tolerances.

I’m a huge fan of Spinnaker, but sometimes you already have a full CI/CD system in place and you really don’t want to replace all of that with Spinnaker. You really just want the canary part of Spinnaker. Luckily, you can totally use Kayenta as a standalone service. They even have some light documentation on it!

In my specific case, I also want to use Azure Storage as the place where I store the data for Kayenta - canary configuration, that sort of thing. It’s totally possible to do that, but, at least at the time of this writing, the hal config canary Halyard command does not have Azure listed and the docs don’t cover it.

So there are a couple of things that come together here, and maybe all of it’s interesting to you or maybe only one piece. In any case, here’s what we’re going to build:

Standalone Kayenta diagram

  • A Kubernetes ingress to allow access to Kayenta from your CI/CD pipeline.
  • A deployment of the Kayenta microservice.
  • Kayenta configured to use an Azure Storage Account to hold its configuration and such.

Things I’m not going to cover:

  • How exactly your CI/CD canary stage needs to work.
  • How long a canary stage should last.
  • How exactly you should configure Kayenta (other than the Azure part).
  • Which statistics you should monitor for your services to determine if they “pass” or “fail.”
  • Securing the Kayenta ingress so only authenticated/authorized access is allowed.

This stuff is hard and it gets pretty deep pretty quickly. I can’t cover it all in one go. I don’t honestly have answers to all of it anyway, since a lot of it depends on how your build pipeline is set up, how your app is set up, and what your app does. There’s no “one-size-fits-all.”

Let’s do it.


First, provision an Azure Storage account. Make sure you enable HTTP access because right now Kayenta requires HTTP and not HTTPS.

You also need to provision a container in the Azure Storage account to hold the Kayenta contents.

# I love me some PowerShell, so examples/scripts will be PowerShell.
# Swap in your preferred names as needed.
$ResourceGroup = "myresourcegroup"
$StorageAccountName = "kayentastorage"
$StorageContainerName = "kayenta"
$Location = "westus2"

# Create the storage account with HTTP enabled.
az storage account create `
  --name $StorageAccountName `
  --resoure-group $ResourceGroup `
  --location $Location `
  --https-only false `
  --sku Standard_GRS

# Get the storage key so you can create a container.
$StorageKey = az storage account keys list `
  --account-name $StorageAccountName `
  --query '[0].value' `
  -o tsv

# Create the container that will hold Kayenta stuff.
az storage container create `
  --name $StorageContainerName `
  --account-name $StorageAccountName `
  --account-key $StorageKey

Let’s make a namespace in Kubernetes for Kayenta so we can put everything we’re deploying in there.

# We'll use the namespace a lot, so a variable
# for that in our scripting will help.
$Namespace = "kayenta"
kubectl create namespace $Namespace

Kayenta needs Redis. We can use the Helm chart to deploy a simple Redis instance. Redis must not be in clustered mode, and there’s no option for providing credentials.

helm repo add bitnami

# The name of the deployment will dictate the name of the
# Redis master service that gets deployed. In this example,
# 'kayenta-redis' as the deployment name will create a
# 'kayenta-redis-master' service. We'll need that later for
# Kayenta configuration.
helm install kayenta-redis bitnami/redis `
  -n $Namespace `
  --set cluster.enabled=false `
  --set usePassword=false `
  --set master.persistence.enabled=false

Now let’s get Kayenta configured. This is a full, commented version of a Kayenta configuration file. There’s also a little doc on Kayenta configuration that might help. What we’re going to do here is put the kayenta.yml configuration into a Kubernetes ConfigMap so it can be used in our service.

Here’s a ConfigMap YAML file based on the fully commented version, but with the extra stuff taken out. This is also where you’ll configure the location of Prometheus (or whatever) where Kayenta will read stats. For this example, I’m using Prometheus with some basic placeholder config.

apiVersion: v1
kind: ConfigMap
  name: kayenta
  namespace: kayenta
  kayenta.yml: |-
      port: 8090

    # This should match the name of the master service from when
    # you deployed the Redis Helm chart earlier.
      connection: redis://kayenta-redis-master:6379

        enabled: false

        enabled: false

    # This is the big one! Here's where you configure your Azure Storage
    # account and container details.
        enabled: true
          - name: canary-storage
            storageAccountName: kayentastorage
            # azure.storageKey is provided via environment AZURE_STORAGEKEY
            # so it can be stored in a secret. You'll see that in a bit.
            # Don't check in credentials!
            accountAccessKey: ${azure.storageKey}
            container: kayenta
            rootFolder: kayenta
              - OBJECT_STORE

        enabled: false

        enabled: false

        enabled: false

        enabled: false

    # Configure your Prometheus here. Or if you're using something else, disable
    # Prometheus and configure your own metrics store. The important part is you
    # MUST have a metrics store configured!
        enabled: true
        - name: canary-prometheus
            baseUrl: http://prometheus:9090
            - METRICS_STORE

        enabled: true

        enabled: false

        enabled: false

        enabled: true

        enabled: false

        enabled: false

        enabled: false

        enabled: false

        enabled: false

    # Enable the SCAPE endpoint that has the same user experience that the Canary StageExecution in Deck/Orca has.
    # By default this is disabled - in standalone we enable it!
        enabled: true

          series: SERVER_ERROR
          attempts: 10
          backoffPeriodMultiplierMs: 1000

        writeDatesAsTimestamps: false
        writeDurationsAsTimestamps: false

    management.endpoints.web.exposure.include: '*' always

          queueName: kayenta.keiko.queue
          deadLetterQueueName: kayenta.keiko.queue.deadLetters

      applicationName: ${}
        enabled: true

      enabled: true
      title: Kayenta API
        - /admin.*
        - /canary.*
        - /canaryConfig.*
        - /canaryJudgeResult.*
        - /credentials.*
        - /fetch.*
        - /health
        - /judges.*
        - /metadata.*
        - /metricSetList.*
        - /metricSetPairList.*
        - /metricServices.*
        - /pipeline.*
        - /standalone.*

Save that and deploy it to the cluster.

kubectl apply -f kayenta-configmap.yml

You’ll notice in the config we just put down that we did not include the Azure Storage acccount key. Assuming we want to commit that YAML to a source control system at some point, we definitely don’t want credentials in there. Instead, let’s use a Kubernetes secret for the Azure Storage account key.

# Remember earlier we got the storage account key for creating
# the container? We're going to use that again.
kubectl create secret generic azure-storage `
  -n $Namespace `

It’s deployment time! Let’s get a Kayenta container into the cluster! Obviously you can tweak all the tolerances and affinities and node selectors and all that to your heart’s content. I’m keeping the example simple.

apiVersion: apps/v1
kind: Deployment
  name: kayenta
  namespace: kayenta
  labels: kayenta
  replicas: 1
    matchLabels: kayenta
      labels: kayenta
        - name: kayenta
          # Find the list of tags here:
          # This is just the tag I've been using for a while. I use one of the images NOT tagged
          # with Spinnaker because the Spinnaker releases are far slower.
          image: ""
            # If you need to troubleshoot, you can set the logging level by adding
            # -Dlogging.level.root=TRACE
            # Without the log at DEBUG level, very little logging comes out at all and
            # it's really hard to see if something goes wrong. If you don't want that
            # much logging, go ahead and remove the log level option here.
            - name: JAVA_OPTS
              value: "-XX:+UnlockExperimentalVMOptions -Dlogging.level.root=DEBUG"
            # We can store secrets outside config and provide them via the environment.
            # Insert them into the config file using ${dot.delimited} versions of the
            # variables, like ${azure.storageKey} which we saw in the ConfigMap.
            - name: AZURE_STORAGEKEY
                  name: azure-storage
                  key: storage-key
            - name: http
              containerPort: 8090
              protocol: TCP
              path: /health
              port: http
              path: /health
              port: http
            - name: config-volume
              mountPath: /opt/kayenta/config
        - name: config-volume
            name: kayenta

And let’s save and apply.

kubectl apply -f kayenta-deployment.yml

If you have everything wired up right, the Kayenta instance should start. But we want to see something happen, right? Without kubectl port-forward?

Let’s put a LoadBalancer service in here so we can access it. I’m going to show the simplest Kubernetes LoadBalancer here, but in your situation you might have, say, an nginx ingress in play or something else. You’ll have to adjust as needed.

apiVersion: v1
kind: Service
  name: kayenta
  namespace: kayenta
  labels: kayenta
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector: kayenta
  type: LoadBalancer

Let’s see it do something. You should be able to get the public IP address for that LoadBalancer service by doing:

kubectl get service/kayenta -n $Namespace

You’ll see something like this:

NAME         TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)    AGE
kayenta      LoadBalancer   80/TCP     54s

Take note of that external IP and you can visit the Swagger docs in a browser:

If it’s all wired up, you should get some Swagger docs!

The first operation you should try is under credentials-controller - GET /credentials. This will tell you what metrics and object stores Kayenta thinks it’s talking to. The result should look something like this:

    "name": "canary-prometheus",
    "supportedTypes": [
    "endpoint": {
      "baseUrl": "http://prometheus"
    "type": "prometheus",
    "locations": [],
    "recommendedLocations": []
    "name": "canary-storage",
    "supportedTypes": [
    "rootFolder": "kayenta",
    "type": "azure",
    "locations": [],
    "recommendedLocations": []

If you are missing the canary-storage account pointing to azure - that means Kayenta can’t access the storage account or it’s otherwise misconfigured. I found the biggest gotcha here was that it’s HTTP-only and that’s not the default for a storage account if you create it through the Azure portal. You have to turn that on.


What do you do if you can’t figure out why Kayenta isn’t connecting to stuff?

Up in the Kubernetes deployment, you’ll see the logging is set up at the DEBUG level. The logging is pretty good at this level. You can use kubectl logs to get the logs from the Kayenta pods or, better, use stern for that Those logs are going to be your secret. You’ll see errors that pretty clearly indicate whether there’s a DNS problem or a bad password or something similar.

If you still aren’t getting enough info, turn the log level up to TRACE. It can get noisy, but you’ll only need it for troubleshooting.

Next Steps

There’s a lot you can do from here.

Canary configuration: Actually configuring a canary is hard. For me, it took deploying a full Spinnaker instance and doing some canary stuff to figure it out. There’s a bit more doc on it now, but it’s definitely tricky. Here’s a pretty basic configuration where we just look for errors by ASP.NET microservice controller. No, I can not help or support you in configuring a canary. I’ll give you this example with no warranties, expressed or implied.

  "canaryConfig": {
    "applications": [
    "classifier": {
      "groupWeights": {
        "StatusCodes": 100
      "scoreThresholds": {
        "marginal": 75,
        "pass": 75
    "configVersion": "1",
    "description": "App Canary Configuration",
    "judge": {
      "judgeConfigurations": {
      "name": "NetflixACAJudge-v1.0"
    "metrics": [
        "analysisConfigurations": {
          "canary": {
            "direction": "increase",
            "nanStrategy": "replace"
        "groups": [
        "name": "Errors By Controller",
        "query": {
          "customInlineTemplate": "PromQL:sum(increase(http_requests_received_total{app='my-app',azure_pipelines_version='${location}',code=~'5\\\\d\\\\d|4\\\\d\\\\d'}[120m])) by (action)",
          "scopeName": "default",
          "serviceType": "prometheus",
          "type": "prometheus"
        "scopeName": "default"
    "name": "app-config",
    "templates": {
  "executionRequest": {
    "scopes": {
      "default": {
        "controlScope": {
          "end": "2020-11-20T23:01:09.3NZ",
          "location": "baseline",
          "scope": "control",
          "start": "2020-11-20T21:01:09.3NZ",
          "step": 2
        "experimentScope": {
          "end": "2020-11-20T23:01:09.3NZ",
          "location": "canary",
          "scope": "experiment",
          "start": "2020-11-20T21:01:09.3NZ",
          "step": 2
    "siteLocal": {
    "thresholds": {
      "marginal": 75,
      "pass": 95

Integrate with your CI/CD pipeline: Your deployment is going to need to know how to track the currently deployed vs. new/canary deployment. Statistics are going to need to be tracked that way, too. (That’s the same as if you were using Spinnaker.) I’ve been using the KubernetesManifest@0 task in Azure DevOps, setting trafficSplitMethod: smi and making use of the canary control there. A shell script polls Kayenta to see how the analysis is going.

How you do this for your template is very subjective. Pipelines at this level are really complex. I’d recommend working with Postman or some other HTTP debugging tool to get things working before trying to automate it.

Secure it!: You probably don’t want public anonymous access to the Kayenta API. I locked mine down with oauth2-proxy and Istio but you could do it with nginx ingress and oauth2-proxy or some other mechanism.

Put a UI on it!: As you can see, configuring Kayenta canaries without a UI is actually pretty hard. Nike has a UI for standalone Kayenta called “Referee”. At the time of this writing there’s no Docker container for it so it’s not as easy to deploy as you might like. However, there is a Dockerfile gist that might be helpful. I have not personally got this working, but it’s on my list of things to do.

Huge props to my buddy Chris who figured a lot of this out, especially the canary configuration and Azure DevOps integration pieces.

halloween, maker, costumes comments edit

Due to the COVID-19 pandemic, we didn’t end up doing our usual hand-out-candy-and-count-kids thing. However, we did make costumes. How could we not? Something had to stay normal.

My daughter Phoenix, who is now nine, is obsessed with Hamilton. I think she listens to it at least once daily. Given that, she insisted that we do Hamilton costumes. I was to be A. Ham, Jenn as Eliza, and Phoenix as their daughter also named Eliza.

I was able to put Phoe’s costume together in two or three days. We used a pretty standard McCall’s pattern with decent instructions and not much complexity.

For mine… I had to do some pretty custom work. I started with these patterns:

It took me a couple of months to get things right. They didn’t really have 6’2” fat guys in the Revolutionary War so there was a lot of adjustment, especially to the coat, to get things to fit. I made muslin versions of everything probably twice, maybe three times for the coat to get the fit right.

I had a really challenging time figuring out how the shoulders on the coat went together. The instructions on the pattern are fairly vague and not what I’m used to with more commercial patterns. This tutorial article makes a similar coat and helped a lot in figuring out how things worked. It’s worth checking out.

Modifications I had to make:

  • Coat:
    • Arms lengthened.
    • Arm holes bigger around.
    • Arms bigger around.
    • Body lengthened.
    • Lapels trimmed to be square at the top (more like the stage production).
    • Didn’t put on the shoulder buttons or the back ribbons the pattern called for.
    • Set buttonholes 2” long and 2.5” apart (roughly - they don’t specify this in the pattern but other research turned these measurements up and it worked out).
    • Six buttonholes around each cuff instead of three (more like the stage production).
  • Pants:
    • Cut off slightly below the knee.
    • Taper to fit tightly around the bottom below the knee.
    • Add a “flap” on each side at the bottom of each leg to look like there are buttons holding it together.
    • Finish the bottom of each leg with a band.
  • Vest: Lengthen the body.

I didn’t have to modify the shirt. The shirt is already intentionally big and baggy because that’s how shirts were back then, so there was a lot of play.

The pants were more like… I didn’t have a decent pattern that actually looked like Revolutionary War pants so I took some decent costume pants and just modded them up. They didn’t have button fly pants back then and my pants have that, but I also wasn’t interested in drop-front pants or whatever other pants I’d have ended up with. I do need to get around in these things.

I didn’t keep a cost tally this time and it’s probably good I didn’t. There are well over 50 buttons on this thing and buttons are not cheap. I bought good wool for the whole thing at like $25/yard (average) and there are a good six-to-eight yards in here. I went through a whole bolt of 60” muslin betwee my costume and the rest of our costumes. I can’t possibly have come out under $300.

But they turned out great!

Here’s my costume close up on a dress form:

Front view of Hamilton

Three-quarters view of Hamilton

And the costume in action:

Travis as Alexander Hamilton

Here’s the whole family! I think they turned out nicely.

The whole Hamilton family!

Work! Work!

We're looking for a mind at work!

azure, docker, powershell comments edit

I’ve been doing some work with creating and migrating Azure Container Registry instances around lately so I thought I’d share a few helpful scripts. Obvious disclaimers - YMMV, works on my machine, I’m not responsible if you delete something you shouldn’t have, etc.


I need to create container registries that have customer managed key support enabled. Unfortunately, there are a lot of steps to this and there are some things that aren’t obvious, like:

  • You need to use the “Premium” SKU for this to work.
  • The Key Vault and the thing being encrypted using customer managed keys (e.g., the container registry) need to be in the same subscription and geographic region. They only say this in the docs about VM disk encryption but it seems to be applicable to all CMK usage.

Normally I’d think about doing this with something like Terraform but as of this writing, Terraform doesn’t have support for ACR + CMK so… script it is.


This is more a “pruning” operation than deleting, but “prune” isn’t an approved PowerShell verb and I do love me some PowerShell.

In a CI/CD environment, generally you want to keep:

  • The current successfully deployed image.
  • The previous successfully deployed image.
  • The image you want to deploy next (canary style).

…and, actually, that’s about it. CI/CD is fail-forward, so there’s not really a roll-back-three-versions case. You’d roll back the code and build a new container.

Point being, there’s not really a retention policy that handles this in ACR right now. While this script also doesn’t totally handle it the way I’d like, what it can do is keep the most recent X tags of an image and prune all the old ones. I also added a way to regex match a container repository by name so you can be more precise about targeting what you want to prune.


This is sort of a bulk copy operation for ACR. For reasons I won’t get into, I needed to copy all the images off an ACR, delete/re-create the ACR, and copy them all back. While the az CLI supports importing one image/tag at a time, there’s not really a bulk copy. There’s a ‘transfer artifacts’ mechanism but it’s sort of complex to set up and the az CLI is already here, so…

This script gets all the repositories and all the tags from each repository and does az acr import on all of them. It’s not fast, but it gets the job done.

kubernetes comments edit

Here’s what I want:

  • Istio 1.6.4 in Kubernetes acting as the ingress.
  • oauth2-proxy wrapped around one application, not the whole cluster.
  • OpenID Connect support for Azure AD - both interactive OIDC and support for client_credentials OAuth flow.
  • Istio token validation in front of the app.
  • No replacing the Istio sidecar. I want things running as stock as possible so I’m not too far off the beaten path when it’s upgrade time.

I’ve set this up in the past without too much challenge using nginx ingress but I don’t want Istio bypassed here. Unfortunately, setting up oauth2-proxy with an Istio (Envoy) ingress is a lot more complex than sticking a couple of annotations in there.

Luckily, I found this blog article by Justin Gauthier who’d done a lot of the leg-work to figure things out. The difference in that blog article and what I want done are:

  • That article uses an older version of Istio so some of the object definitions don’t apply to my Istio 1.6.4 setup.
  • That article wraps everything in the cluster (via the Istio ingress) with oauth2-proxy and I only want one service wrapped.

With all that in mind, let’s get going.


There are some things you need to set up before you can get this going.

DNS Entries

Pick a subdomain on which you’ll have the service and the oauth2-proxy. For our purposes, let’s pick as the subdomain. You want a single subdomain so you can share cookies and so it’s easier to set up DNS and certificates.

We’ll put the app and oauth2-proxy under that.

  • The application/service being secured will be at
  • The oauth2-proxy will be at

In your DNS system you need to assign the wildcard DNS * to the IP address that your Istio ingress is using. If someone visits they should be able to get to your service in the cluster via the Istio ingress gateway.

Azure AD Application

For an application to allow OpenID Connect / OAuth through Azure AD, you need to register the application with Azure AD. The application should be for the service you’re securing.

In that application you need to:

  • On the “Overview” tab, make a note of…
    • The “Application (client) ID” - you’ll need it later. For this example, let’s say it’s APPLICATION-ID-GUID.
    • The “Directory (tenant) ID” - you’ll need it later. For this example, let’s say it’s TENANT-ID-GUID
  • On the “Authentication” tab:
    • Under “Web / Redirect URIs,” set the redirect URI to /oauth2/callback relative to your app, like
    • Under “Implicit grant,” check the box to allow access tokens to be issued.
  • On the “Expose an API” tab, create a scope. It doesn’t matter really what it’s called, but if no scopes are present then client_credentials won’t work. I called mine user_impersonation but you could call yours fluffy and it wouldn’t matter. The scope URI will end up looking like api://APPLICATION-ID-GUID/user_impersonation where that GUID is the ID for your application.
  • On the “API permissions” tab:
    • Grant permission to that user_impersonation scope you just created.
    • Grant permission to Microsoft.Graph - User.Read so oauth2-proxy can validate credentials.
    • Click the “Grant admin consent” button at the top or client_credentials won’t work. There’s no way to grant consent in the middle of that flow.
  • On the “Certificates & secrets” page, under “Client secrets,” create a client secret and take note of it. You’ll need it later. For this example, we’ll say the client secret is myapp-client-secret but yours is going to be a long string of random characters.

Finally, somewhat related - take note of the email domain associated with your users in Azure Active Directory. For our example, we’ll say everyone has an email address. We’ll use that when configuring oauth2-proxy for who can log in.


Set up cert-manager in the cluster. I found the DNS01 solver worked best for me with Istio in the mix because it was easy to get Azure DNS hooked up.

The example here assumes that you have it set up so you can drop a Certificate into a Kubernetes namespace and cert-manager will take over, request a certificate, and populate the appropriate Kubernetes secret that can be used by the Istio ingress gateway for TLS.

Setting up cert-manager isn’t hard, but there’s already a lot of documentation on it so I’m not going to repeat all of it.

If you can’t use cert-manager in your environment then you’ll have to adjust for that when you see the steps where the TLS bits are getting set up later.

The Setup

OK, you have the prerequisites set up, let’s get to it.

Istio Service Entry

If you have traffic going through an egress in Istio, you will need to set up a ServiceEntry to allow access to the various Azure AD endpoints from oauth2-proxy. I have all outbound traffic requiring egress so this was something I had to do.

kind: ServiceEntry
  name: azure-istio-egress
  namespace: istio-system
  - '*'
  - '*'
  - '*'
  location: MESH_EXTERNAL
  - name: https
    number: 443
    protocol: HTTPS
  resolution: NONE

I use a lot of other Azure services, so I have some pretty permissive outbound allowances. You can try to reduce this to just the minimum of what you need by doing a little trial and error. I know I ran into:

  • - Azure AD graph API
  • - Common JWKS endpoint
  • - Token issuer, also used for token validation
  • *, * - Some UI redirection happens to allow OIDC login here with a Microsoft account

I’ll admit after I got through a bunch of different minor things, I just started whitelisting egress allowances. It wasn’t that important for me to be exact for this.

I did deploy this to the istio-system namespace. It seems that it doesn’t matter where a ServiceEntry gets deployed, once it’s out there it works for any service in the cluster. I ended up just deploying all of these to the istio-system namespace so it’s easier to track.

TLS Certificate

OpenID Connect via Azure AD requires a TLS connection for your app. cert-manager takes care of converting a Certificate object to a Kubernetes Secret for us.

It’s important to note that we’re going to use the standard istio-ingressgateway to handle our inbound traffic, and that’s in the istio-system namespace. You can’t read Kubernetes secrets across namespaces, so the Certificate needs to be deployed to the istio-system namespace.

This is one of the places where you’ll see why it’s good to have picked a common subdomain for the oauth2-proxy and the app - wildcard certificate.

kind: Certificate
  name: tls-myapp-production
  namespace: istio-system
  commonName: '*'
  - '*'
    kind: ClusterIssuer
    name: letsencrypt-production
  secretName: tls-myapp-production

Application Namespace

Create your application namespace and enable Istio sidecar injection. This is where your app/service, oauth2-proxy, and Redis will go.

kubectl create namespace myapp
kubectl label namespace myapp istio-injection=enabled


You need to enable Redis as a session store for oauth2-proxy if you want the Istio token validation in place. I gather this isn’t required if you don’t want Istio doing any token validation, but I did, so here we go.

I used the Helm chart v10.5.7 for Redis. There are… a lot of ways you can set up Redis. I set up the demo version here in a very simple, non-clustered manner. Depending on how you set up Redis, you may need to adjust your oauth2-proxy configuration.

Here’s the values.yaml I used for deploying Redis:

  enabled: false
usePassword: true
password: "my-redis-password"
    enabled: false

The Application

When you deploy your application, you’ll need to set up:

  • The Kubernetes Deployment and Service
  • The Istio VirtualService and Gateway

The Deployment doesn’t have anything special, it just exposes a port that can be routed to by a Service. Here’s a simple Deployment.

apiVersion: apps/v1
kind: Deployment
  name: myapp
  namespace: myapp
  labels: myapp
  replicas: 1
    matchLabels: myapp
      labels: myapp
      - image: ""
        imagePullPolicy: IfNotPresent
        name: myapp
        - containerPort: 80
          name: http
          protocol: TCP

We have a Kubernetes Service for that Deployment:

apiVersion: v1
kind: Service
  name: myapp
  namespace: myapp
  labels: myapp
  # Exposes container port 80 on service port 8000.
  # This is pretty arbitrary, but you need to know
  # the Service port for the VirtualService later.
  - name: http
    port: 8000
    protocol: TCP
    targetPort: http
  selector: myapp

The Istio VirtualService is another layer on top of the Service that helps in traffic control. Here’s where we start tying the ingress gateway to the Service.

kind: VirtualService
  labels: myapp
  name: myapp
  namespace: myapp
  # Name of the Gateway we're going to deploy in a minute.
  - myapp
  # The full host name of the app.
  - route:
    - destination:
        # This is the Kubernetes Service info we just deployed.
        host: myapp
          number: 8000

Finally, we have an Istio Gateway that ties the ingress to our VirtualService.

kind: Gateway
  labels: myapp
  name: myapp
  namespace: myapp
    istio: ingressgateway
  - hosts:
    # Same host as the one in the VirtualService, the full
    # name for the service.
      # The name here must be unique across all of the ports named
      # in the Istio ingress. It doesn't matter what it is as long
      # as it's unique. I like using a modified version of the
      # host name.
      name: https-myapp-cluster-example-com
      number: 443
      protocol: HTTPS
      # This is the name of the secret that cert-manager placed
      # in the istio-system namespace. It should match the
      # secretName in the Certificate.
      credentialName: tls-myapp-production
      mode: SIMPLE

At this point, if you have everything set up right, you should be able to hit and get to it anonymously. There’s no oauth2-proxy in place, but the ingress is all wired up to use TLS with that wildcard certificate cert-manager got you and the DNS was set up, too.

If you can’t get to the service, one of the things isn’t lining up:

  • You forgot to enable Istio sidecar injection on the app namespace or did it after you deployed. Restart the deployments to get the sidecars added.
  • DNS hasn’t propagated.
  • The secret with the TLS certificate isn’t in the istio-system namespace - it must be in istio-system for the ingress to find it.
  • The Gateway isn’t lining up - credentialName is wrong, host name is wrong, port name isn’t unique.
  • The VirtualService isn’t lining up - host name is wrong, Gateway name doesn’t match, Service name or port is wrong.
  • The Service isn’t lining up - the selector doesn’t select any pods, the destination port on the pods is wrong.

If it feels like you’re Odysseus trying to shoot an arrow through 12 axes, yeah, it’s a lot like that. This isn’t even all the axes.


For this I used the Helm chart v3.2.2 for oauth2-proxy. I created the cookie secret for it like this:

docker run -ti --rm python:3-alpine python -c 'import secrets,base64; print(base64.b64encode(secrets.token_bytes(16)));'

You’re also going to need the client ID from your Azure AD application as well as the client secret. You should have grabbed those during the prerequisites earlier.

The values:

  # The client ID of your AAD application.
  # The client secret you generated for the AAD application.
  clientSecret: "myapp-client-secret"
  # The cookie secret you just generated with the Python container.
  cookieSecret: "the-big-base64-thing-you-made"
  # Here's where the interesting stuff happens:
  configFile: |-
    auth_logging = true
    azure_tenant = "TENANT-ID-GUID"
    cookie_httponly = true
    cookie_refresh = "1h"
    cookie_secure = true
    email_domains = ""
    oidc_issuer_url = ""
    pass_access_token = true
    pass_authorization_header = true
    provider = "azure"
    redis_connection_url = "redis://redis-master.myapp.svc.cluster.local:6379"
    redis_password = "my-redis-password"
    request_logging = true
    session_store_type = "redis"
    set_authorization_header = true
    silence_ping_logging = true
    skip_provider_button = true
    skip_auth_strip_headers = false
    skip_jwt_bearer_tokens = true
    standard_logging = true
    upstreams = [ "static://" ]

Important things to note in the configuration file here:

  • The client ID, client secret, and Azure tenant ID information are all from that Azure AD application you registered as a prerequisite.
  • The logging settings, like silence_ping_logging or auth_logging are totally up to you. These don’t matter to the functionality but make it easier to troubleshoot.
  • The redis_connection_url is going to depend on how you deployed Redis. You want to connect to the Kubernetes Service that points to the master, at least in this demo setup. There are a lot of Redis config options for oauth2-proxy that you can tweak. Also, storing passwords in config like this isn’t secure so, like, do something better. But it’s also a lot more to explain how to set up and mount secrets and all that here, so just pretend we did the right thing.
  • The pass_access_token, pass_authorization_header, set_authorization_header, and skip_jwt_bearer_tokens values are super key here. The first three must be set that way for OIDC or OAuth to work; the last one must be set for client_credentials to work.

Note on client_credentials: If you want to use client_credentials with your app, you need to set up an authenticated emails file in oauth2-proxy. In that emails file, you need to include the service principal ID for the application that’s authenticating. Azure AD issues a token for applications with that service principal ID as the subject, and there’s no email.

The service principal ID can be retrieved if you have your application ID:

az ad sp show --id APPLICATION-ID-GUID --query objectId --out tsv

You’ll also need your app to request a scope when you submit a client_credentials request - use api://APPLICATION-ID-GUID/.default as the scope. (That .default scope won’t exist unless you have some scope defined, which is why you defined one earlier.)

Getting back to it… Once oauth2-proxy is set up, you need to add the Istio wrappers on it.

First, let’s add that VirtualService

kind: VirtualService
  labels: oauth2-proxy
  name: oauth2-proxy
  namespace: myapp
  # We'll deploy this gateway in a moment.
  - oauth2-proxy
  # Full host name of the oauth2-proxy.
  - route:
    - destination:
        # This should line up with the Service that the
        # oauth2-proxy Helm chart deployed.
        host: oauth2-proxy
          number: 80

Now the Gateway

kind: Gateway
  labels: oauth2-proxy
  name: oauth2-proxy
  namespace: myapp
    istio: ingressgateway
  - hosts:
    # Same host as the one in the VirtualService, the full
    # name for oauth2-proxy.
      # Again, this must be unique across all ports named in
      # the Istio ingress.
      name: https-oauth-cluster-example-com
      number: 443
      protocol: HTTPS
      # Same secret as the application - it's a wildcard cert!
      credentialName: tls-myapp-production
      mode: SIMPLE

OK, now you should be able to get something if you hit You’re not passing through it for authentication yet you will likely see something along the lines of an error saying “The reply URL specified in the request does not match the reply URLs configured for the application.” The point is, it shouldn’t be some arbitrary 500 or 404. oauth2-proxy should kick in.

Istio Token Validation - RequestAuthentication

We want Istio to do some token validation in front of our application, so we can deploy a RequestAuthentication object.

kind: RequestAuthentication
  labels: myapp
  name: myapp
  namespace: myapp
  - issuer:
      # Match labels should not select the oauth2-proxy, just
      # the application being secured. myapp

The Magic - Envoy Filter for Authentication

The real magic is this last step, an Istio EnvoyFilter to pass authentication requests for your app through oauth2-proxy. This is the biggest takeaway I got from Justin’s blog article and it’s really the key to the whole thing.

Envoy filter format is in flux. The object defined here is really dependent on the version of Envoy that Istio is using. This was a huge pain. I ended up finding the docs for the Envoy ExtAuthz filter and feeling my way through the exercise, but you should be aware these things do change.

Here’s the Envoy filter:

kind: EnvoyFilter
  labels: myapp
  name: myapp
  namespace: istio-system
  - applyTo: HTTP_FILTER
      context: GATEWAY
            name: envoy.http_connection_manager
              # In Istio 1.6.4 this is the first filter. The examples showing insertion
              # after some other authorization filter or not showing where to insert
              # the filter at all didn't work for me. Istio just failed to insert the
              # filter (silently) and moved on.
              name: istio.metadata_exchange
          # The filter should catch traffic to the service/application.
      operation: INSERT_AFTER
        name: envoy.filters.http.ext_authz
                - exact: accept
                - exact: authorization
                - exact: cookie
                - exact: from
                - exact: proxy-authorization
                - exact: user-agent
                - exact: x-forwarded-access-token
                - exact: x-forwarded-email
                - exact: x-forwarded-for
                - exact: x-forwarded-host
                - exact: x-forwarded-proto
                - exact: x-forwarded-user
                - prefix: x-auth-request
                - prefix: x-forwarded
                - exact: authorization
                - exact: location
                - exact: proxy-authenticate
                - exact: set-cookie
                - exact: www-authenticate
                - prefix: x-auth-request
                - prefix: x-forwarded
                - exact: authorization
                - exact: location
                - exact: proxy-authenticate
                - exact: set-cookie
                - exact: www-authenticate
                - prefix: x-auth-request
                - prefix: x-forwarded
              # URIs here should be to the oauth2-proxy service inside your
              # cluster, in the namespace where it was deployed. The port
              # in that 'cluster' line should also match up.
              cluster: outbound|80||oauth2-proxy.myapp.svc.cluster.local
              timeout: 1.5s
              uri: http://oauth2-proxy.myapp.svc.cluster.local

That’s it, you should be good to go!

Note I didn’t really mess around with trying to lock the headers down too much. This is the set I found from the blog article by Justin Gauthier and every time I tried to tweak too much, something would stop working in subtle ways.

Try It Out

With all of this in place, you should be able to hit and the Envoy filter will redirect you through oauth2-proxy to Azure Active Directory. Signing in should get you redirected back to your application, this time authenticated.


There are a lot of great tips about troubleshooting and diving into Envoy on the Istio site. This forum post is also pretty good.

Here are a couple of spot tips that I found to be of particular interest.

Finding the Envoy Version

As noted in the EnvoyFilter section, filter formats change based on the version of Envoy that Istio is using. You can find out what version of Envoy you’re running in your Istio cluster by using:

$podname = kubectl get pod -l app=prometheus -n istio-system -o jsonpath='{$.items[0]}'
kubectl exec -it $podname -c istio-proxy -n istio-system -- pilot-agent request GET server_info

You’ll get a lot of JSON explaining info about the Envoy sidecar, but the important bit is:

 "version": "80ad06b26b3f97606143871e16268eb036ca7dcd/1.14.3-dev/Clean/RELEASE/BoringSSL"

In this case, it’s 1.14.3.

Look at What Envoy is Doing

It’s hard to figure out where the Envoy configuration gets hooked up. The istioctl proxy-status command can help you.

istioctl proxy-status will yield a list like this:

NAME                                                         CDS        LDS        EDS        RDS          PILOT                       VERSION
myapp-768b999cb5-v649q.myapp                                 SYNCED     SYNCED     SYNCED     SYNCED       istiod-5cf5bd4577-frngc     1.6.4
istio-egressgateway-85b568659f-x7cwb.istio-system            SYNCED     SYNCED     SYNCED     NOT SENT     istiod-5cf5bd4577-frngc     1.6.4
istio-ingressgateway-85c67886c6-stdsf.istio-system           SYNCED     SYNCED     SYNCED     SYNCED       istiod-5cf5bd4577-frngc     1.6.4
oauth2-proxy-5655cc447d-5ftbq.myapp                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-5cf5bd4577-frngc     1.6.4
redis-5f7c5b99db-tp5l7.myapp                                 SYNCED     SYNCED     SYNCED     SYNCED       istiod-5cf5bd4577-frngc     1.6.4

Once you’ve deployed, you’ll see a myapp listener as well as the Istio ingress. You can dump their config by doing something like

istioctl proxy-config listeners myapp-768b999cb5-v649q.myapp -o json

Sub in the name of the listener as needed. It will generate a huge raft of JSON, so you might need to dump it to a file so you can scroll around in it and find what you want.

  • The application listener will show you info about the sidecar attached to the app.
  • The ingress gateway listener will show you info about ingress traffic (including showing your Envoy filter).

When All Else Fails, Restart the Ingress

When all else fails, restart the ingress pod. kubectl rollout restart deploy/istio-ingressgateway -n istio-system can get you pretty far. When it seems like everything should be working but you’re getting errors like “network connection reset” and it doesn’t make sense… just try kicking the ingress pods. Sometimes the configuration needs to be freshly rebuilt and deployed and that’s how you do it.

I don’t know why this happens, but if you’ve deployed and undeployed some Envoy filters a couple of times… sometimes something just stops working. Restarting the ingress is the only way I’ve found to fix it… but it works!

Other Options

oauth2-proxy isn’t the only way to get this done.

I did see this authservice plugin, which appears to be an Envoy extension to provide oauth2-proxy services right in Envoy itself. Unfortunately, it doesn’t support the latest Istio versions; it requires you manually replace the Istio sidecar with this custom version; and it doesn’t seem to support client_credentials, which is a primary use case for me.

There’s an OAuth2 filter for Envoy currently in active development (alpha) but I didn’t see that it supported OIDC. I could be wrong there. I’d love to see someone get this working inside Istio.

For older Istio there was an App Identity and Access Adapter but Mixer adapters/plugins have been deprecated in favor of WASM extensions for Envoy.

Are there others? Let me know in the comments!

maker comments edit

For Christmas last year Jenn got me a SainSmart 3018 CNC router and I’ve really been getting into it. It’s a steep learning curve, a bit more than 3D printing, but my 3D printing knowledge has helped a lot in knowing what sort of things I should look for.

I’ve been getting into it enough that I wanted to upgrade the spindle in it, and my parents got me a Makita RT0701C trim router. This is a fairly common upgrade path - replacing the stock spindle with a Makita or DeWalt router - and you’ll see it in larger setups like the Shapeoko XXL.

IMPORTANT UPDATE: After posting this article I found that, while I was successful in getting the router mounted and generally working in 2D carving (like making letters on a sign), when doing 3D carving I lost Z height a lot. After a lot of trial and error I determined that the SainSmart 3018 PRO does not have strong enough stepper motors to drive the weight of the Makita router. Later models like the 3018PROVer do have the strength, which is why you see so many folks successful with this. For me, I ended up reverting my 3018 back to the stock spindle and upgrading to a Sienci LongMill 30 x 30 which is where I’ve got my Makita router now.

There are two challenges to overcome when you upgrade the spindle to a router like this.

First, you have to figure out how to mount the router to the CNC frame. I solved this by creating a 3D printed combination holder and dust shoe, which you can get on Thingiverse.

Second, you have to change how you turn the power on and off when carving. The stock spindle is powered right off the control board. When you send your gcode to the router, one of the codes turns the spindle on, which sends power to the spindle and it gets moving. A larger trim router like this is plugged in separately and instead of the control board turning it on and off, it’s generally accepted that you have to turn the router on manually with its power switch before you start cutting. Since nothing will be attached to the actual control board power, the “turn on the spindle” command will be effectively ignored.

I… don’t like that. I’m fine if I have to adjust the speed manually on the router, but I would really like the control board to turn the router on and off as needed for the cut. Lots of people, myself included, solve this using a relay. This shows you how to wire it up.

DISCLAIMER: You’re going to be working with electricity. Be safe. Make good connections. Don’t get your fingers in there. I’m not responsible for you burning your house down by making bad wire splices or injuring yourself from touching live electrical stuff. Respect the electricity. This isn’t much more difficult than wiring up a new lightswitch at home, but… just be careful.

Parts you’ll need:

  • One extension cord. It doesn’t have to be very long, you’re going to cut it to get the two end plugs. (Amazon)
  • One solid state relay. It should allow an input voltage of 12V DC and an output voltage matching your router (mine is a 120V AC router). I bought a relay that allows 3 - 32V DC input and 24 - 380V AC output so it’ll “just work.” (Amazon)
  • Your original spindle power cable. You’re going to cut it because you want the wires and the plastic connector that attaches to the control board. You could also make a new one, but I don’t anticipate plugging my old spindle in again.
  • Extra wire in case you want the connection between the control board and the relay to be longer.
  • A battery with some leads. The battery should be enough to trigger the relay. I chose a 9V battery which falls in that 3 - 32V DC range. You’ll use this for testing the relay wiring.
  • Something to plug in to test the relay wiring. I used a light bulb.
  • Solder and soldering iron.
  • Electrical tape.
  • Wire cutters.

Relay circuit parts

First thing we’re going to do is just make sure the relay is working. This is also helpful to understand how the control board will be turning the router on and off; and it gets your test set up.

Attach one wire to the positive input terminal of the relay and another wire to the negative terminal. Connect your battery to the wires - positive to positive, negative to negative. You should see the light go on to indicate the relay has been triggered. (If you’re usinga mechanical relay, you should hear a click.) When the control board “starts the spindle” it’s going to send 12V in and trigger the relay just like the battery is doing now.

Triggering the relay

Disconnect the battery. We’re done with this part of the testing.

Cut the extension cord so you can get some wires connected to the plugs. I cut about 12 inches from each end of the cord. That left me with:

  • A male plug with about 12 inches of cord
  • A female plug with about 12 inches of cord
  • A long strand that came from the middle of the cord

You can leave more cord connected to the plugs if you want. Just make sure you leave enough that you can make a good splice and have some slack to plug in. We don’t need that strand from the middle of the cord. You can save it and do something else with it or you can throw it away.

The extension cord will have an outer insulation/wrap and three wires inside it. Each wire also has insulation around it. Likely they’ll be color coded - green is ground, black is “hot” or “active,” and white is neutral. The black and white wires are what effectively makes the circuit powering your router, so we’re going to insert the relay in the middle of one of those to act like a switch. I chose to put the relay in the middle of the white wire.

If you don’t know how to make a good wire splice I would recommend watching this quick YouTube video on how to do a linesman’s splice. You’re working with some real electricity here and a bad splice can cause all sorts of problems like burning your house down.

Splice the two green wires together so the ground is continuous. My router is a two-prong non-grounded plug so it doesn’t use ground, but having this finished is valuable for later, I think. Wrap that splice in electrical tape to make sure it’s insulated from the other wires.

Now splice the two black wires together so the “hot” path is continuous. Again, wrap that in electrical tape so it’s nice and insulated.

Finally, attach one white wire to each of the “output” terminals on the relay. Make sure there’s a good connection and that they’re screwed down nice and tight.

You should end up with something that looks like this:

Wires spliced

Test time! Now it’s time to make sure your wire splices are good, that things are wired up correctly, and so on. This is also where you’ll want to be extra careful because if you didn’t wire stuff up right, it could be bad news.

Plug in your test load (like I used a light bulb) to the female plug. Then plug in the male plug to an electrical outlet (ideally with a surge protector and/or GFCI circuit breaker for your protection). At this point, even plugged in, the test load (light) should be off. Finally… connect the battery to the input terminals of the relay just like we did in the earlier test. The relay should activate and the test load should turn on! If you remove the battery, it should turn back off.

Test your splices

Disconnect the battery and unplug the relay from the wall and disconnect the wires you were using to test with the battery. The last step is to get the power connector from the control board to the relay working.

If you’re going to make a brand new cable that runs from the control board to the relay, now’s the time. I didn’t do that and I’m not walking through that process.

If you are reusing the original spindle power cable like I did… Snip the metal clips off the ends of the red and black wires that used to power your old spindle. Strip a small amount of the ends of the wire and connect red to positive, black to negative on the relay. It’ll end up looking like this:

Connect the control power cable

That’s it. That’s the whole circuit. Plug this into the wall, plug your router into this, connect the control cable to your control board on the router, and then flip the router switch on. If you use a gcode sender to send M3 that will turn the spindle on. You should see the light on the relay turn on and the router itself should turn on. If you use the gcode sender to send M5 that will turn the router back off.

I recommend putting this in a box or covering it. You don’t want the connections on the relay to get accidentally shorted. I made a quick 3D printed box for mine; you can do something similar or figure something else out. It all depends on the size of the relay and cord you bought, so it’s not one-size-fits-all. If you want to buy a box, search for “project boxes.”

All done, here’s what my setup looks like now:

The finished setup

The black box in the middle mounted to the wall contains the relay. It plugs into the power strip along the left. The red and black cables go to the control board. My Makita router plugs into the relay. (I have the cord routed up and hanging so it’s out of the way.)

I hope that helps folks get back some of their control with the upgraded router!

Note: You might be wondering how you can now automated speed control of the spindle, not just on/off. That’s not as straightforward and there are tons of forums involving rewiring routers with variable electronic speed controls (VESC) and all sorts of other cool-but-non-trivial things. I didn’t solve this problem since setting the speed dial before the cut isn’t a huge deal; and I generally don’t change speeds a lot.