When working with Terraform, HashiCorp recommends keeping your workspaces small and focused on the resources that make up a single component of a larger infrastructure stack. Doing so has many benefits, but this best practice can introduce dependencies between workspaces, which in turn introduces a new challenge: how do you ensure that these interdependent component workspaces are automatically created (or destroyed) in the right order?
I wrote this blog post to present a pattern to reduce operational overhead from managing multi-workspace deployments. When combined with ephemeral workspaces, a feature coming soon to Terraform Cloud, this pattern can also help reduce costs by allowing you to destroy an entire stack of workspaces in a logical order. I’m a HashiCorp Solutions Engineer who uses this pattern frequently, and it’s used by Terraform creator and HashiCorp Co-Founder Mitchell Hashimoto, and many others.
Notes: This method is not an official HashiCorp-recommended best practice and is not intended to be the solution to all use cases. Rather, it’s just one example to explore and build upon. This blog post also includes a simpler solution to the challenge above, which should work for a large number of use cases. While this blog post was written with Terraform Cloud in mind, the same concepts and configuration will also work in Terraform Enterprise as well, depending on your version. Finally, the suggestions presented here all assume a basic understanding of Terraform Cloud, and you can try out the code examples yourself in this GitHub repository.
A workspace in Terraform Cloud contains everything Terraform needs to manage a given collection of infrastructure, including variables, state, configuration, and credentials. For example, you may have one workspace that defines your virtual network and another for compute. These workspaces depend on each other; you cannot create your compute until you have a virtual network. The outputs from one workspace become the inputs to another.
Terraform Cloud has a feature called run triggers. This allows you to kick off an apply on your compute workspace after your virtual network has been created. You can also define a one-to-many relationship here, where the successful apply on one upstream workspace triggers multiple downstream workspaces.
This alone solves some dependency challenges and is great for simple workflows. If it works for your use case, you should use it.
However, it doesn’t address all situations, which is why Mitchell Hashimoto created the multispace provider to handle the kind of cascading creation/destruction workflows that can’t be done with simple run triggers. His example use case involves creating a Kubernetes stack: first you create the underlying virtual machines, then the core kubernetes services, DNS, and ingress. Each of these is its own separate workspace.
This initial implementation has since been refined and incorporated into the official Terraform Cloud/Enterprise provider (also called the “TFE provider”) in the form of the tfe_workspace_run
resource.
For clarity, this post uses the following terminology when referring to the roles a workspace could have in multi-workspace deployments (an individual workspace may have one or more of these roles):
tfe_workspace_run
resource.tfe_workspace
resource. This is usually also a workspace runner in my use cases, but it may not be a requirement for yours.Here’s an example of how run triggers and tfe_workspace_run
differ: run triggers will always kick off a plan on the downstream workspace once the upstream workspace has completed a successful apply. Sometimes this results in plans on the downstream workspace that are unnecessary (in the case of a do-nothing plan) or that fail (when a downstream workspace has a dependency on multiple upstream workspaces but some upstream workspaces haven’t yet completed their apply phases).
With tfe_workspace_run
you can specify when to apply and under what circumstance. For example, with depends_on
, a workspace runner could wait until several upstream workspaces have applied before kicking off the downstream workspace. If that is the only benefit relevant to you, chances are that run triggers are probably good enough for your use case; you’re probably fine with a do-nothing or failed plan every now and then.
You can use the tfe_workspace_run
resource in two operational modes:
Fire-and-forget: The resource simply queues a run on the downstream workspace and considers it good enough if the run was successfully queued. This mode is very similar to how run triggers work.
Wait: The resource queues a run on the downstream workspace and waits for it to successfully apply. After a successful plan, the resource can wait for a human approval on the apply or initiate the apply itself. Optionally, the resource can retry if the apply fails.
Run triggers do one thing: they trigger apply runs on downstream workspaces, and they do it only after the upstream has completed successfully. They do not handle destruction use cases. For example, you should destroy your compute before destroying your virtual network, and run triggers do not give you a means to model that side of the dependency.
This is where the real power of tfe_workspace_run
comes in. The resource allows you to kick off a destroy on a downstream workspace and, if you’re using depends_on
, you can ensure that nothing in the upstream workspace is destroyed until the downstream workspace has successfully finished its destroy.
While you can configure both the apply and destroy behavior for the downstream workspace, you don’t need to use both. There are cases where you only want to apply a downstream workspace. There are also times where you only want to destroy a downstream workspace, but you will trigger an apply yourself.
The tfe_workspace_run
resource documentation on the Terraform Registry includes a few example code snippets to use as a starting point. At its most basic, the resource looks like this:
resource "tfe_workspace_run" "ws_run_parent" {
workspace_id = "ws-fSX576JZGENVaeMi"
apply {
# tfe_workspace_run is responsible for approving the apply
# part of the run
# this is the only required argument in the apply{} and
# destroy{} blocks
manual_confirm = false
# if the run fails, try again, up to 3 times, waiting between
# 1 and 30 seconds
# this is the default behaviour, presented here for clarity
wait_for_run = true
retry_attempts = 3
retry_backoff_min = 1
retry_backoff_max = 30
}
destroy {
manual_confirm = false
}
}
This example shows what’s meant by a workspace runner. For the specified workspace, our tfe_workspace_run
resource will trigger an apply, wait for that to complete, then consider the tfe_workspace_run
successfully created. On destroy, the tfe_workspace_run
will trigger a destroy, wait for that to complete, then consider the tfe_workspace_run
successfully destroyed.
Because we are using the TFE provider, the workspace runner requires a TFE_TOKEN
with sufficient permissions to kick off plan/apply/destroy runs on child workspaces. (You may wish to use the Terraform Cloud secrets engine in HashiCorp Vault to generate these, but that is out-of-scope for this blog post.)
Beyond the basic examples, this post will present a few patterns with example configuration for how to use this resource.
The tfe_workspace_run
resource is most useful when creating new workspaces. This example uses the TFE provider to create a workspace, set up all the necessary permissions, and configure the dynamic credentials. All of that must be done before the workspace can be applied.
As a reminder, the term “workspace creator” refers to any workspace responsible for creating other workspaces and related resources. In most cases when using a workspace creator, it will also be a workspace runner for the workspaces it creates (i.e. it is responsible for triggering apply and/or destroy runs on those workspaces).
resource "tfe_workspace_run" "downstream" {
workspace_id = tfe_workspace.downstream.id
# depends_on = creds and other workspace dependencies go here
apply {
# Fire and Forget
wait_for_run = false
# auto-apply
manual_confirm = false
}
destroy {
# Wait for destroy before doing anything else
wait_for_run = true
# auto-apply
manual_confirm = false
}
}
From the perspective of the workspace runner, it doesn’t need to care if the downstream workspace was successfully applied, just that an apply was attempted. This functionality alone is achievable with run triggers (and in this fire-and-forget mode, the behavior is very similar), but as this example is already using tfe_workspace_run
to handle the destroy, it makes sense to use it for the apply as well.
By including the destroy{}
block in combination with depends_on
, you can ensure that the workspace and supporting resources remain untouched until the downstream workspace has successfully destroyed all the resources it manages.
If you do not include a destroy{}
block, then attempting to delete the downstream workspace will result in an error like this:
Error: error deleting workspace ws-BxxKPnyBVpxwVQB1: This
workspace has 4 resources under management and must be force
deleted by setting force_delete = true
If you do not include the depends_on
, then dependencies such as variables and credentials that the downstream workspace needs will end up getting deleted too early.
The upcoming ephemeral workspaces will ensure the entire stack of workspaces is safely destroyed in the correct order once the ephemeral workspace hits its time-to-live (TTL).
This is a pattern I use extensively in my workspace creator. As a reminder, this is a “workspace runner” (i.e. a workspace which triggers runs on other workspaces). In my case, when creating a new workspace, I don’t necessarily want it to try to apply immediately, so my configuration looks like this:
resource "tfe_workspace_run" "downstream" {
workspace_id = tfe_workspace.downstream.id
# depends_on = creds and other workspace dependencies go here
destroy {
wait_for_run = true
manual_confirm = false
}
}
The only difference between this and the previous example is the absence of the apply{}
block. This means that when doing an apply on the workspace runner, Terraform will create a placeholder resource referencing a non-existent apply on the downstream workspace. This may seem pointless as nothing has actually been done yet, but the presence of this resource in Terraform state signals to Terraform that, when it comes time to destroy the workspace runner, it should first kick off a destroy on the downstream workspace.
Similarly, you can also have an apply-only workflow by including an apply{}
block but no destroy{}
. For most use cases, an apply{}
block with no destroy{}
block is practically identical to just using run triggers. If it is, then just use run triggers. But for some niche use-cases, you may want to conditionally apply, which run triggers does not allow you to do.
This is the main reason for the concept of a workspace runner separate from the idea of an upstream workspace. The previous examples have a single upstream workspace for every downstream workspace. In cases like that, introducing an additional workspace runner just adds unnecessary complexity; the upstream can handle the runner functionality.
Workspace runners become useful in cases where there are more than two workspaces. While upstream workspaces can handle the runner role functionally, if you have apply or destroy runs configured to wait for completion, then you’ll have more workspaces in your stack, which results in more concurrent runs. If you have too many, the queue will fill up, and you’ll end up in a deadlock, where downstream workspaces are queued but can never begin, and upstream workspaces are waiting on those downstream workspace runs.
By introducing a separate workspace runner, you can ensure you need to consume only two concurrency slots: one for the runner, and one for whichever other workspace it is currently running.
Some examples, first a simple one, then a complex one:
resource "tfe_workspace_run" "A" {
workspace_id = data.tfe_workspace.ws["A"].id
apply {
wait_for_run = true
manual_confirm = false
}
destroy {
wait_for_run = true
manual_confirm = false
}
}
resource "tfe_workspace_run" "B" {
workspace_id = data.tfe_workspace.ws["B"].id
depends_on = [tfe_workspace_run.A]
apply {
wait_for_run = true
manual_confirm = false
}
destroy {
wait_for_run = true
manual_confirm = false
}
}
resource "tfe_workspace_run" "C" {
workspace_id = data.tfe_workspace.ws["C"].id
depends_on = [tfe_workspace_run.B]
apply {
wait_for_run = true
manual_confirm = false
}
destroy {
wait_for_run = true
manual_confirm = false
}
}
Of course, there’s no reason you’re limited to each workspace having one upstream or one downstream. You may have a complex web of dependencies between your workspaces. Here’s as example (with the code in the repo linked at the end of the post):
The resource required to power this workflow is now an official, supported part of the TFE provider. As more people make use of tfe_workspace_run
and give feedback, HashiCorp can continue to improve the resource.
If you aren’t already using Terraform Cloud, you’ll want to experiment with that first. Start for free by signing up for an account.
Please note that to avoid hitting concurrency limits on the Free tier of Terraform Cloud, you can:
tfe_workspace_run
in fire-and-forget mode, ortfe_workspace_run
in wait mode, orIf you are an existing Terraform Cloud user, then experiment with these patterns. Regardless of whether you’re using run triggers or tfe_workspace_run
, automating the dependencies between workspaces will save you time and effort.
As mentioned at the beginning of the post, you can find these code examples in this GitHub repository.
]]>So this whole story starts with me looking to manage my DNS records better.
I wrote a whole blog post about it if you’re interested: Moving my DNS records to Route 53 with Terraform
I was modifying them by hand in our domain registrar’s own DNS settings page… and it was kinda a pain.
At work we use AWS Route 53 for some DNS records, so I was familiar with how that worked, and it seemed like a good idea to move to that too. Not because managing DNS records in Route 53 by hand is much easier, but it opened up the possibility for automating it.
Which is where Terraform comes in!
Before we can create some DNS records, we need a hosted zone
It’s prety simple to create one. Just needs a name:
resource "aws_route53_zone" "lmhd-me" {
name = "lmhd.me"
}
I will want to delegate the test subdomain to a different zone too:
resource "aws_route53_record" "lmhd-me_NS_test-lmhd-me" {
zone_id = aws_route53_zone.lmhd-me.zone_id
name = "test.lmhd.me"
type = "NS"
ttl = "300"
records = aws_route53_zone.test-lmhd-me.name_servers
}
I don’t need to do that, but I like that separation, and it gives me the flexibilty to move that to an entirely different AWS account (or somewhere else) in future.
The first problem I encounted almost immediately was that you can’t do CNAMEs on apex domains, and I wanted to do something for lmhd.me.
Route 53 doesn’t have a concept of an “ANAME”. So I would have to roll my own.
Thankfully, Terraform has a DNS Provider so I could query for the IP Address of my Netlify site, and then use that when defining my A Record for lmhd.me
:
# Lookup CNAME for Netlify.
data "dns_cname_record_set" "lmhd-dot-me-netlify-com" {
host = "lmhd-dot-me.netlify.com"
}
# Get IPs for that CNAME
data "dns_a_record_set" "netlify-com" {
host = data.dns_cname_record_set.lmhd-dot-me-netlify-com.cname
}
# Set up the A record for lmhd.me
resource "aws_route53_record" "lmhd-me-A-record" {
zone_id = aws_route53_zone.lmhd-me.zone_id
name = "lmhd.me"
type = "A"
ttl = "300"
records = ["${data.dns_a_record_set.netlify-com.addrs}"]
}
The rest of the records are fairly simple. Most are just CNAMEs, and have similar format to the above.
I set this up to run in Terraform Cloud, and away I went, adding new DNS records as and when I needed them. For example:
resource "aws_route53_record" "widgets-lmhd-me-CNAME-record" {
zone_id = aws_route53_zone.lmhd-me.zone_id
name = "widgets.lmhd.me"
type = "CNAME"
ttl = "300"
records = ["lmhd-widgets.netlify.app"]
}
I’m lazy efficient, so I don’t want to have to write out all of that every time I want to add something.
Can we make it easier on ourselves?
What if we make TF Variables out of this?
Took a bit of fiddling to get something which works, but here’s what I came up with as my initial version:
#
# DNS Records
#
variable "lmhd_records" {
type = map(object({
type = string
ttl = string
records = list(string)
}))
default = {
"widgets.lmhd.me" = {
type = "CNAME"
ttl = "300"
records = ["lmhd-widgets.netlify.app"]
}
}
}
#
# DNS Records
#
resource "aws_route53_record" "lmhd_record" {
for_each = var.lmhd_records
zone_id = aws_route53_zone.lmhd-me.zone_id
name = each.key
type = each.value.type
ttl = each.value.ttl
records = each.value.records
}
This makes use of Terraform’s for_each Meta-Argument, so rather than defining a Terraform resource for every DNS record, I define a single resource and update the variable instead.
And our plan looks like this:
# aws_route53_record.lmhd_record["widgets.lmhd.me"] will be created
+ resource "aws_route53_record" "lmhd_record" {
+ allow_overwrite = (known after apply)
+ fqdn = (known after apply)
+ id = (known after apply)
+ name = "widgets.lmhd.me"
+ records = [
+ "lmhd-widgets.netlify.app",
]
+ ttl = 300
+ type = "CNAME"
+ zone_id = "REDACTED"
}
That’s… okay. Maybe slightly less typing… but now it’s messy. Why would I want to do this?
And if I stopped there, I would agree, it’s not really worth it.
But it is a stepping stone in the right direction: getting automatically generated TF code!
Can I use external files to auto-generate my Terraform code?
And… yes. Terraform has a bunch of functions, including fileset which lets me query the content of a directory.
First we need to pull in some files…
locals {
lmhd_records_files = fileset(path.module, "lmhd.me/*.json")
}
output "lmhd_records" {
value = local.lmhd_records_files
}
This gives us a list of all JSON files in the lmhd.me/
directory:
Changes to Outputs:
+ lmhd_records = [
+ "lmhd.me/widgets.lmhd.me.json",
]
So that’s a start.
That one JSON file of mine looks like this:
{
"type": "CNAME",
"ttl": "300",
"records": [
"lmhd-widgets.netlify.app"
]
}
And we can pull that in with Terraform making use of a few other functions:
resource "aws_route53_record" "lmhd_record" {
for_each = local.lmhd_records_files
zone_id = aws_route53_zone.lmhd-me.zone_id
# Use the filename as the name of the DNS record
# e.g. lmhd.me/widgets.lmhd.me.json --> widgets.lmhd.me
name = trimsuffix(basename(each.key), ".json")
# Get the rest of the values from keys in the JSON file
type = jsondecode(file(each.key))["type"]
ttl = jsondecode(file(each.key))["ttl"]
records = jsondecode(file(each.key))["records"]
}
Our plan looks like this:
# aws_route53_record.lmhd_record["lmhd.me/widgets.lmhd.me.json"] will be created
+ resource "aws_route53_record" "lmhd_record" {
+ allow_overwrite = (known after apply)
+ fqdn = (known after apply)
+ id = (known after apply)
+ name = "widgets.lmhd.me"
+ records = [
+ "lmhd-widgets.netlify.app",
]
+ ttl = 300
+ type = "CNAME"
+ zone_id = "REDACTED"
}
Looking good 🙂
Okay, fine…
Our widgets.lmhd.me.yaml
file:
type: CNAME
ttl: 300
records:
- lmhd-widgets.netlify.app
locals {
lmhd_records_files = fileset(path.module, "lmhd.me/*.yaml")
}
resource "aws_route53_record" "lmhd_record" {
for_each = local.lmhd_records_files
zone_id = aws_route53_zone.lmhd-me.zone_id
# Use the filename as the name of the DNS record
# e.g. lmhd.me/widgets.lmhd.me.json --> widgets.lmhd.me
name = trimsuffix(basename(each.key), ".yaml")
# Get the rest of the values from keys in the YAML file
type = yamldecode(file(each.key))["type"]
ttl = yamldecode(file(each.key))["ttl"]
records = yamldecode(file(each.key))["records"]
}
Happy?
That works fine as an initial Proof of Concept… but what if I’ve not specified some of those keys in my file?
In my case, most of these values are going to be the same.
I’m primarily defining CNAMEs, and I rarely change the TTL.
So I kinda don’t really want to have to specify them in my files every time.
Can we do something about that?
And… yes we can.
Let’s have a YAML file like this:
records:
- lmhd-widgets.netlify.app
And some Terraform code like this:
variable "default_ttl" {
type = string
default = "300"
}
resource "aws_route53_record" "lmhd_record" {
for_each = local.lmhd_records_files
zone_id = aws_route53_zone.lmhd-me.zone_id
# Use the "name" key if specified
# otherwise fallback to filename, minus .yaml
name = lookup(
yamldecode(file(each.key)),
"name",
trimsuffix(basename(each.key), ".yaml")
)
# Default to CNAME type unless told otherwise
type = lookup(
yamldecode(file(each.key)),
"type",
"CNAME"
)
# Use Default TTL unless told otherwise
ttl = lookup(
yamldecode(file(each.key)),
"ttl",
var.default_ttl
)
# We must have some records, otherwise this will fail
records = yamldecode(file(each.key))["records"]
}
Now, if I do not specify some of those values, it will use the defaults.
I can also override the name of the DNS record if I so chose.
That last bit, if we do not specify any records in our YAML file, Terraform spits out the following error message:
╷
│ Error: Invalid index
│
│ on lmhd.me.tf line 348, in resource "aws_route53_record" "lmhd_record":
│ 348: records = yamldecode(file(each.key))["records"]
│ ├────────────────
│ │ each.key is "lmhd.me/widgets.lmhd.me.yaml"
│
│ The given key does not identify an element in this collection value.
Which is okay. You can figure out what’s gone wrong there.
We can’t add custom error messages, because we can’t use validation blocks, as those don’t exist for resources, only variables.
I mean, why go through all this effort, when you could just write the Terraform code?
What’s wrong with this?
resource "aws_route53_record" "widgets-lmhd-me-CNAME-record" {
zone_id = aws_route53_zone.lmhd-me.zone_id
name = "widgets.lmhd.me"
type = "CNAME"
ttl = "300"
records = ["lmhd-widgets.netlify.app."]
}
And besides the obvious answer of “I’m doing this because it’s fun…
It actually does have some valid use-cases.
A couple years ago I did a talk at HashiConf about how we generate Terraform code for our Vault configuration.
You can watch it here later:
The short version though… our pipeline generates Terraform code. So for example, if a user wants to add a new policy, they don’t worry about writing the Terraform code for that, they just raise a Pull Request which contains the policy itself, and our pipeline figures out the rest.
That was originally done with some bash scripts… which it turns out, once we had several hundred, was pretty slow.
So we rewrote it in Go and now it’s faster.
But what if we didn’t need any external tools in the first place?
What if we could just have Terraform do all the hard work?
That’s how I’ve been approaching Terraforming my own personal Vault.
So let’s see if I can improve that with some sort of dynamic Terraform like we’ve seen above.
Let’s start with policies, as those are the easiest to conceptualise.
It’s just one file, and you want the content of each of those files to become policies in Vault.
#
# Dynamic Policies from Files
#
locals {
policy_files = fileset(path.module, "policies/*.hcl")
}
resource "vault_policy" "policies" {
for_each = local.policy_files
# Use the name of the policy file as the name of the policy
name = trimsuffix(basename(each.key), ".hcl")
# And use the content of the file as the policy itself
policy = file(each.key)
}
So for an example policy…
# vault_policy.policies["policies/test.hcl"] will be created
+ resource "vault_policy" "policies" {
+ id = (known after apply)
+ name = "test"
+ policy = <<-EOT
# Test policy
# Just comments
# doesn't actually do anything
EOT
}
Hey! That looks pretty good 😀
One thing to note though is I’ve had to manually remove the existing policies from Terraform State, so that Terraform doesn’t delete anything. e.g.
$ terraform state rm vault_policy.vault_terraform
Acquiring state lock. This may take a few moments...
Removed vault_policy.vault_terraform
Successfully removed 1 resource instance(s).
$ terraform state rm vault_policy.default
Acquiring state lock. This may take a few moments...
Removed vault_policy.default
Successfully removed 1 resource instance(s).
I could re-import those policies into the Terraform state if I wanted to, so the resulting Terraform plan is a no-op… but as writing Vault policies is idempotent, I’m not really bothered for the moment.
Something I would definitely want to do in a Production environment though.
As we’ve added more things to our Vault Terraform pipeline at work, the amount of time it takes for the pipeline to run end-to-end is… well, it’s longer than I’d like.
So what if we had like… 10k policies?
How would Terraform Cloud handle that?
Let’s generate a bunch of test policies:
$ for i in $(seq 1 10000); do echo "# test policy ${i}" > test_policy_${i}.hcl; done
Yeah, let’s just casually add about 5MiB to this git repo. No big deal. 😎
Writing objects: 100% (10003/10003), 528.05 KiB | 5.33 MiB/s, done.
Unsurprisingly, simply cloning this chonky boi took Terraform Cloud quite a while.
It wasn’t until about 4 minutes in that it actually started planning anything.
So… maybe I should at least give it a chance with a smaller set first, and go from there.
With a mere 1000 test policies, it took under a minue to Plan, and a similar amount of time to Apply.
Nice.
And adding another 1000?
3 minute plan, 3 minute apply. So 6 minutes end-to-end, with over 2000 resources configured through Terraform.
Pretty good.
And to delete them all once I’m done?
About a minute and a half, end-to-end.
Dang that’s great!
For me, the next thing I’m going to work on is AppRoles and PKI roles, and then see where we go from there. But that’s a job for another day.
If I were setting up a Vault Terraform pipeline in a new company for the first time, this is definitely the approach I’d want to take.
As much off-the-shelf as possible, and and minimal external dependencies.
]]>And a couple of years ago, I spoke at HashiConf1 about how we2 manage our Vault configuration with Terraform. So it feels like it’s about time I get around to doing something similar for my own Vault.
For now, I’m not gonna do anything fancy; I just want bare minimum Vault Terraformability.
I’ll be using Terraform Cloud for this, because I don’t want to have to do things by hand if I can avoid it, and TFC is cool. Terraform Enterprise also exists, and is almost the same thing as Terraform Cloud, but different in ways I’m not going to get into right now. If you do not have an account, you can sign up for one at app.terraform.io
I’m also not going into how to set up Vault; this is mostly for my own reference, but if anybody else is following this, I’m assuming you’ve got a Vault set up already. If you haven’t got a Vault yet, you may wish to consider HCP Vault, which is now generally available, so you don’t have to put much effort into spinning it up.
My Vault is available over the public Internet directly, because YOLO, so if yours isn’t then you’ll also need to get routing from TFC/TFE to Vault in place.
Let’s begin!
Let’s start with the bootstrap configuration on the Vault side.
First thing we need is a Policy, which will grant Terraform the ability to manage stuff. For now, I’m leaving it at the bare minimum, just so I can prove that it’s all working, and I’ll expand on it later (with Terraform itself).
My initial terraform_vault
policy looks like this:
# Terraform creates a Child Token to interact with Vault
path "auth/token/create" {
capabilities = ["update"]
}
# A Test secret, to prove TF is working
# In my case, this is a KVv2 Secret Engine, so we need /data/ in there
path "kv/data/terraform" {
capabilities = ["create", "update", "read"]
}
Next, we need an Auth method. I’ll be using AppRole for this, because it’s super flexible, and nobody’s written a dedicated Terraform Cloud/Terraform Enterprise Auth plugin for Vault yet3.
Logged in to my Vault UI, I’m going to use the built-in Browser CLI. That’s this little icon in the top right:
I’d love to this in the UI directly… but there’s no AppRole UI yet. Maybe one day.
So, again, bare minimum:
> write auth/approle/role/vault_terraform token_policies=vault_terraform token_ttl=300
Success! Data written to: auth/approle/role/vault_terraform
> read auth/approle/role/vault_terraform
Key Value
bind_secret_id true
local_secret_ids false
secret_id_bound_cidrs null
secret_id_num_uses 0
secret_id_ttl 0
token_bound_cidrs []
token_explicit_max_ttl 0
token_max_ttl 0
token_no_default_policy false
token_num_uses 0
token_period 0
token_policies ["vault_terraform"]
token_ttl 300
token_type default
Short TTL, no usage limits or CIDR restrictions, etc. But we’re bootstrapping, and this isn’t Production, so we don’t care right now. Terraform will update it’s own AppRole later to make it more secure.
Now we need some Terraform code, to actually… you know… do something.
First thing, we tell Terraform we’re using TFC:
terraform {
backend "remote" {
organization = "lmhd"
workspaces {
name = "vault"
}
}
}
This links the repo to a Terraform Cloud workspace (which we will create in a bit), and uses that to store the Terraform state.
Next we need to auth with Vault:
# These variables intentionally left blank
variable "login_approle_role_id" {}
variable "login_approle_secret_id" {}
provider "vault" {
# Not setting Vault Address
# We can pull that from the VAULT_ADDR env var
auth_login {
path = "auth/approle/login"
parameters = {
role_id = var.login_approle_role_id
secret_id = var.login_approle_secret_id
}
}
}
Here we’re telling Terraform to use AppRole authentication to log in to Vault, and we’re giving it the AppRole’s Role ID and Secret ID as a Terraform Variable.
I’m not specifying where my Vault is; I’ll be doing that with an environment variable.
And then, finally, we’ll have Terraform actually do something. In this case, create a simple KV secret:
resource "vault_generic_secret" "example" {
path = "kv/terraform"
data_json = <<EOT
{
"foo": "bar",
"pizza": "cheese"
}
EOT
}
Git Commit, Git Push, Done.
We’ll add a bunch more stuff to this later, but that’s a problem for Future Lucy.
Now we want to set things up in TFC so it applies our Terraform code. There is a Terraform Provider for Terraform Cloud4, and I’ll probably look into that in future. For now, I’ll just do things by hand.
First things first, we need to create a new Workspace. I’m using the Version control workflow, so I can link it to a Git repo, and not have to worry about things:
In my case, my Terraform code is stored in GitHub, so I’ll select that:
And I’ll pick my vault_terraform
repo:
As far as Settings go, I’m going to call my workspace “vault”:
And I’m going to expand the Advanced options and enable Automatic speculative plans. This should mean that if I make Pull Requests into this repo, then Terraform Cloud will run terraform plan
on them. Pretty useful.
We’re almost done!
Now that we have Terraform Cloud configured to use our Terraform code, we need to tell it how to find Vault, and how to auth.
We’ll start with the latter.
Remember how, in the Terraform code, we defined some AppRole variables, but didn’t set any values?
variable "login_approle_role_id" {}
variable "login_approle_secret_id" {}
We’ll set those now.
Back in our Vault UI’s Browser CLI, we can read the AppRole’s Role ID with:
> read auth/approle/role/vault_terraform/role-id
And we can generate a Secret ID with:
> write -force auth/approle/role/vault_terraform/secret-id
In our Terraform Cloud Workspace, we can add these as Terraform Variables. These are secrets, so I recommend marking them as Sensitive, so they do not show up in the UI and cannot be read.
We can also add the address for our Vault as an Environment Variable, VAULT_ADDR
.
And that should be everything.
In the Runs tab, we can trigger a new plan:
I’m redirected to the Run details page for this Plan, and pretty quickly I can see that Terraform wants to make some changes:
Terraform v0.14.9
Configuring remote state backend...
Initializing Terraform configuration...
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# vault_generic_secret.example will be created
+ resource "vault_generic_secret" "example" {
+ data = (sensitive value)
+ data_json = (sensitive value)
+ disable_read = false
+ id = (known after apply)
+ path = "kv/terraform"
}
Plan: 1 to add, 0 to change, 0 to destroy.
I’m asked to confirm before it applies anything, and then we see that the Apply has been successful.
Terraform v0.14.9
vault_generic_secret.example: Creating...
vault_generic_secret.example: Creation complete after 1s [id=kv/terraform]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
And if we check in Vault, we can see that a secret has been created:
And to make sure it’s persisting state and reading from Vault, I’m going modify that secret by hand, then run Terraform again.
This time we see the plan:
Terraform v0.14.9
Configuring remote state backend...
Initializing Terraform configuration...
vault_generic_secret.example: Refreshing state... [id=kv/terraform]
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# vault_generic_secret.example will be updated in-place
~ resource "vault_generic_secret" "example" {
~ data_json = (sensitive value)
id = "kv/terraform"
# (3 unchanged attributes hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
So I’m satisfied that everything is working as it should.
So it’s pretty basic so far, but it’s a good foundation to build from.
My next steps with this will be to Terraform the chicken/egg things. i.e. the vault_terraform
Policy and AppRole, and then work my way through the rest of the configuration I already have.
I’m also going to see if I can dynamically determine the IP addresses for Terraform Cloud5, and add those as a CIDR restriction on the AppRole.
Should be fun!
You can watch my Vault Terraform HashiConf talk on YouTube, or read it on the HashiCorp website ↩
At time of writing, Sky Betting & Gaming is my employer. We have a technology blog if you want to check it out ↩
oh no, don’t give me ideas… ↩
HashiCorp have an API to list Terraform Cloud IP ranges ↩
I’ve added Netlify CMS to my site, because it’s pretty cool, and seems like a good idea.
The instructions I followed are here: https://www.netlifycms.org/docs/add-to-your-site/
I also used an example config file from here: https://github.com/netlify-templates/jekyll-netlify-cms/blob/master/admin/config.yml
In the case of my blog, that was two commits:
This is adds the admin page, Netlify CMS config, and scripts on the main page to redirect me to the CMS once I log in.
I also needed this, because copypaste errors are the best
Does it work?
Yes.
Yes it does. And it’s really nice.
I log into it using Netlify Identity service, using my Google account. I can create a new post, and when I save it it immediately creates a new PR in my GitHub Repo:
https://github.com/lucymhdavies/lucymhdavies.github.io/pull/5
This in turn triggers an automatic deployment, so I can preview my changes.
The only slight annoyance I have with it so far is that my current posts and images are structures like so:
🐳 :lucymhdavies.github.io lucy $ tree _posts/
_posts/
├── 2017
│ ├── 03
│ │ ├── 2017-03-30-Init.md
│ │ └── 2017-03-30-jekyll-from-ios.md
│ └── 04
│ └── 2017-04-02-dns-under-one-roof.md
├── 2018
│ ├── 03
│ │ └── 2018-03-15-nail-polish.md
│ └── 07
│ └── 2018-07-22-emoji-graphs.md
└── cms
7 directories, 5 files
🐳 :lucymhdavies.github.io lucy $ tree images/
images/
├── 404.jpg
├── LMHD_xs.png
├── avatar.jpg
├── bg.jpg
├── both.gif
├── cms
├── post.PNG
├── posts
│ ├── 2018-03-15
│ │ └── captain-obvious.jpg
│ └── 2018-07-22
│ ├── attention.gif
│ ├── emojitracker.png
│ └── final-graph.png
├── ss-color.png
└── ss.png
4 directories, 12 files
And Netlify CMS doesn’t appear to have an obvious way of nested subdirectories for images and posts, meaning it doesn’t automatically see my previous posts.
No matter. That’s easily fixed. I moved my posts to a common directory:
With that change, Netlify CMS picked up my existing posts
I’ve not done this for images yet, as I’d need to go through all my existing posts and update them inline too. I might do that at some point, but I’m not too worried for now.
But all-in-all… Seems pretty cool so far :)
]]>A few days ago (the day after World Emoji Day as it happens) I discovered a tweet:
Y’all know what to do https://t.co/YCRHtJfWAk
— Lucy Davinhart (@LucyDavinhart) 18 July 2018
Apparently there’s a bot keeping track of which emojis get the most use. It’s made by Jeremy Schmidt and is called, fittingly, LeastUsedEmojiBot. You can find the code on GitHub
As a fan of emojis in general, this got me interested. Obviously I wanted to try to help the humble Aerial Tramway emoji reach its true potential of second least used emoji on Twitter.
As
To keep track, we are on 128916 at the moment. 🚡
— Lucy Davinhart (@LucyDavinhart) 18 July 2018
You
And I think it only counts number of tweets it’s used in, rather than number of uses.
— Lucy Davinhart (@LucyDavinhart) 18 July 2018
It does appear to update pretty dam quick.
🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡
May
Yep. One per tweet. 🚡
— Lucy Davinhart (@LucyDavinhart) 18 July 2018
Have
Also, nice.
— Lucy Davinhart (@LucyDavinhart) 18 July 2018
🚡 https://t.co/gE9CpRsIdb
Noticed
Me: *slaps twitter*
— Lucy Davinhart (@LucyDavinhart) 18 July 2018
This baby can hold so many 🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡🚡
Digging into this further, I found that the bot got its data from a site called Emojitracker, made by Matthew Rothenberg
This site gets realtime updates from the Twitter Streaming API of all emojis used on Twitter.
That’s a LOT of data, and some nice APIs too:
There’s a REST API, to get a snapshot of all emojis, and a Streaming API for updates.
Now I was even more interested.
It was at this point that I noticed that usage of the Aerial Tramway emoji was increasing faster than its rival, the Input Symbol for Latin Capital Letters. This was almost certainly due to LeastUsedEmojiBot highlighting it. At some point it would overtake, but I wasn’t sure how soon.
It was now Friday afternoon. At work, we have a thing called “Learning and Development Time”, in which you can (within reason) basically do whatever you like to further your personal development. It doesn’t even have to be work related. In the past, I’ve used this time to work on various personal dev projects, which I may blog about at some point.
One of those previous projects was a Prometheus Exporter for Twitch.tv. My wife is a Twitch streamer (obligatory plug: SeraphimKimiko), and I’m a nerd, so I wanted to keep track of how many people were watching her live. So I made this, written in Go, consuming the Twitch APIs, and exporting them as Prometheus metrics. I included a Docker Compose file to spin up the Prometheus exporter, a Prometheus server to scrape it, and a pre-configured Grafana instance to draw pretty pretty graphs. And it’s written in Go. Because of course it is.
Thanks to that project, I had most of the code already to graph data from the Emojitracker APIs. I got to work on what would eventually become my Prometheus Exporter for Twitter Emojis.
Let’s see what the Emojitracker API gives us. The API endpoint I’m interested in is https://api.emojitracker.com/v1/rankings, which returns JSON like:
$ curl -s https://api.emojitracker.com/v1/rankings | jq .
[
...
{
"char": "🚡",
"id": "1F6A1",
"name": "AERIAL TRAMWAY",
"score": 130982
},
{
"char": "🔠",
"id": "1F520",
"name": "INPUT SYMBOL FOR LATIN CAPITAL LETTERS",
"score": 130893
}
]
First, we need a Prometheus metric. I used a Gauge (which, in Prometheus is a metric which can go up or down). Arguably I should have used a Counter (which can only go up), but this was a proof of concept, and I wasn’t sure what happens if tweets get deleted. I’m interested in the emoji itself (because apparently both Prometheus and Grafana support those just fine), as well as some plaintext identifiers:
emojiScore = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "lmhd",
Subsystem: "emoji",
Name: "twitter_ranking",
Help: "Number of uses of this emoji on twitter",
},
[]string{
// Which emoji?
"emoji",
"name",
"id",
},
)
Now we need to populate that with some data.
I used json-to-go to quickly generate a type which matched the output of the API:
type EmojiRankingsResponse []struct {
Char string `json:"char"`
ID string `json:"id"`
Name string `json:"name"`
Score int `json:"score"`
}
For my Twitch exporter, I had used curl-to-go to generate some Go code to call the APIs, and return structs. The code my Emoji exporter used was based off that.
There are two functions here. The first calls the API, and returns (among other things) the response body:
func EmojiRankingsRequest() ([]byte, *http.Response, error) {
// Modified from code generated by curl-to-Go: https://mholt.github.io/curl-to-go
url := "https://api.emojitracker.com/v1/rankings"
req, err := http.NewRequest("GET", url, nil)
if err != nil {
log.WithFields(log.Fields{"url": url}).Errorf("%s", err)
return []byte{}, nil, err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
log.WithFields(log.Fields{"url": url}).Errorf("%s", err)
return []byte{}, resp, err
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.WithFields(log.Fields{"url": url}).Errorf("%s", err)
return []byte{}, resp, err
}
return body, resp, nil
}
The second takes that response and converts it into something of type EmojiRankingsResponse.
func Rankings() (EmojiRankingsResponse, error) {
// init an empty response
response := EmojiRankingsResponse{}
// body, resp, err
body, resp, err := EmojiRankingsRequest()
if err != nil {
log.Errorf("%s", err)
return response, err
}
if resp.StatusCode != 200 {
log.Errorf("Error code %s, Error: %s", resp.StatusCode, err)
return response, err
}
err = json.Unmarshal(body, &response)
if err != nil {
log.Errorf("%s", err)
return response, err
}
return response, nil
}
You can find this in emoji.go
So with that in place, I can populate my Prometheus metrics. In my main.go, I iterate through all emojis in that response, and update their corresponding Prometheus metric:
// Init with rest API
rankings, err := Rankings()
if err != nil {
log.Fatalf("%s", err)
}
for _, emoji := range rankings {
emojiScore.With(prometheus.Labels{
"emoji": emoji.Char,
"name": emoji.Name,
"id": emoji.ID,
}).Set(float64(emoji.Score))
}
This worked great! I now had some metrics!
$ curl -s http://localhost:8080/metrics | grep -i strawberry
lmhd_emoji_twitter_ranking{emoji="🍓",id="1F353",name="STRAWBERRY"} 9.273592e+06
But these were static, which is not much use to me. I needed updates.
As a proof of concept, I initially just called the REST API every minute for updates, and updated the prometheus metrics accordingly. But this was me being lazy. The REST API Documentation says you should not do this:
When to use the REST API
In general, use the REST API to build an initial snapshot state for a page (or get a one-time use data grab), but then use the [Streaming API][https://github.com/emojitracker/emojitrack-streamer-spec] to keep it up to date.
Do not repeatedly poll the REST API. It is intentionally aggressively cached in such a way to discourage this, in that the scores will only update at a lower rate (a few times per minute), meaning you have to use the Streaming API to get fast realtime data updates.
🚨 IN OTHER WORDS, IF YOU ARE POLLING FREQUENTLY FOR UPDATES, YOU ARE DOING SOMETHING WRONG AND YOU ARE A BAD PERSON. 🚨
(Note that this is a design decision, not a server performance issue.)
I’d never used a streaming API before, so didn’t know what to expect.
According to the documentation, I could expect:
a JSON blob every 17ms (1/60th of a second) containing the unicode IDs that have incremented and the amount they have incremented during that period.
Example:
data:{'1F4C2':2,'2665':3,'2664':1,'1F65C':1}
I curl’d the API, to see what this looks like, and wow that updates quick!
Looks a bit like this:
$ curl -s https://stream.emojitracker.com/subscribe/eps
data:{"1F405":1,"1F60C":1}
data:{"1F450":1,"1F493":1,"1F498":1,"1F602":1,"1F60D":1,"1F629":1,"25B6":1,"26BD":1}
data:{"1F64F":1}
data:{"1F60F":1,"267B":1}
data:{"1F308":1,"1F4F2":1,"1F602":2,"1F61C":1,"1F64B":1,"2B50":1}
data:{"1F607":1}
data:{"1F335":1,"1F3A5":1,"1F447":1,"1F4F2":1,"1F51E":1,"263A":1,"2705":1}
data:{"1F602":2,"2764":1}
data:{"1F621":1}
data:{"1F48F":1,"1F602":1}
So I needed to consume that URL, look for lines beginning with data:
, and parse the JSON into something useful.
First thing was to just keep reading the API:
resp, _ := http.Get("https://stream.emojitracker.com/subscribe/eps")
reader := bufio.NewReader(resp.Body)
for {
line, _ := reader.ReadBytes('\n')
lineString := string(line)
...
}
We only care about lines which begin with data:
, so let’s get those (and drop the data:
prefix):
// Lines look like
// data:{"1F449":1,"1F44D":1,"1F60F":1,"26F3":1}
if strings.HasPrefix(lineString, "data:") {
data := []byte(strings.TrimPrefix(lineString, "data:"))
...
}
The JSON itself is a series of string keys, with integer values. In Go that could be represented as: map[string]int
.
I wasn’t sure if Go would let me parse the JSON directly into something like that, but I gave it a try:
jsonMap := make(map[string]int)
err = json.Unmarshal(data, &jsonMap)
if err != nil {
panic(err)
}
Sure enough, it worked! It might error at some point, but like I say, proof of concept.
All that was left was to update my metrics. I used the rankings
object I created earlier to lookup the name and emoji for the ID, and used that to increment my prometheus metric:
for key, val := range jsonMap {
for _, emoji := range rankings {
if emoji.ID == key {
emojiScore.With(prometheus.Labels{
"emoji": emoji.Char,
"name": emoji.Name,
"id": emoji.ID,
}).Add(float64(val))
log.Debugf("Char: %s (%s) : %d", key, emoji.Name, val)
}
}
}
And that’s basically it. It could absolutely do with some tidyup (for example, being able to lookup the emoji details from the ID, without having to iterate over rankings
), but it works fine for my proof of concept.
Now, let’s get this into pretty pretty graphs.
I went through a few iterations of this, before I settled on one I liked:
Potentially relevant to @nocturnalBadger's interests.
— Lucy Davinhart (@LucyDavinhart) 21 July 2018
🚡 pic.twitter.com/4ecnaHSd9p
I was predominantly interested in the bottom two emojis, so my dashboard kept track of those two.
I had the overall usage in a Graph panel.
This used Prometheus’ “Bottom K” operator, which I used to filter out only the bottom 10 metrics):
bottomk(10,lmhd_emoji_twitter_ranking)
I also had indvidual Singlestat panels for the two emojis, configured for example with:
lmhd_emoji_twitter_ranking{emoji="🚡"}
I left this running overnight to gather some data, then woke up this morning to discover that, oh no! Disaster struck!
Turns out, at some point in the night my prometheus exporter had stopped consuming the streaming API!
time="2018-07-22T01:43:49Z" level=debug msg="Char: 1F694 (ONCOMING POLICE CAR) : 1"
time="2018-07-22T01:43:49Z" level=debug msg="Char: 203C (DOUBLE EXCLAMATION MARK) : 1"
time="2018-07-22T01:43:49Z" level=debug msg="Char: 267B (BLACK UNIVERSAL RECYCLING SYMBOL) : 1"
time="2018-07-22T01:43:49Z" level=debug msg="Char: 2764 (HEAVY BLACK HEART) : 2"
time="2018-07-22T01:43:49Z" level=debug msg="Char: 1F618 (FACE THROWING A KISS) : 1"
time="2018-07-22T01:43:49Z" level=debug msg="Char: 1F629 (WEARY FACE) : 1"
time="2018-07-22T01:43:49Z" level=debug msg="Char: 1F494 (BROKEN HEART) : 1"
time="2018-07-22T01:43:49Z" level=debug msg="Char: 1F602 (FACE WITH TEARS OF JOY) : 2"
time="2018-07-22T01:43:49Z" level=debug msg="Char: 1F614 (PENSIVE FACE) : 1"
My numbers were stale!
Fortunately, 🚡 had not yet overtaken 🔠, so there was still time for me to see it happen.
One quick docker restart emoji_exporter_emoji_exporter_1
and we were collecting data again.
Gave my Prometheus exporter a restart, and new data is coming in.
— Lucy Davinhart (@LucyDavinhart) 22 July 2018
🚡 has certainly made quite a bit of progress since this morning, but the other one has too.
They're close. Shouldn't be long now. pic.twitter.com/zTuUhuRe3n
I kept watch, and just 20 minutes later, we did it!
So, now that 🚡 is the second least used emoji on Twitter, I'm not sure what will happen now.
— Lucy Davinhart (@LucyDavinhart) 22 July 2018
Some people will continue to tweet, as @leastUsedEmoji will not have updated yet. I believe it's scheduled to update within 25m.
At that point... I'm expecting a massive spike. pic.twitter.com/nTreFd95BH
This is the tweet which pushed 🚡 into second least used!
— Lucy Davinhart (@LucyDavinhart) 22 July 2018
YAY! https://t.co/iVhWOjnGoY
Celebrations all round!
I made a couple of tweaks to the dashboard following that. The new version includes a dropdown, so you can select which emojis you want to compare (from all of them); a table, showing specifically the bottom 5 emojis; and a rate graph, showing how many individual tweets there were over a time interval of 10 minutes.
I've updated the grafana dashboard slightly, if anyone is still interested.
— Lucy Davinhart (@LucyDavinhart) 22 July 2018
🚡
Now includes a table
Code:https://t.co/2PwmnRIC6y
Interactive Snapshot:https://t.co/OT3sNW0jnY
Screenshot: pic.twitter.com/6R01KfoAQC
I've also added a rate graph, which converts the cumulative number of tweets into tweets over time (i'm using a 10 minute interval).
— Lucy Davinhart (@LucyDavinhart) 22 July 2018
Usage spikes are even more obvious in this graph. pic.twitter.com/ocbWwFtbAD
You mean I need to do stuff because it has a point?
Nah.
That’s not a thing.
Seriously though, this was a fun thing to work on, especially as I was able to re-use so much code, letting me play a bit more without figuring out how to just get something working.
I already do a lot of realtime monitoring of a bunch of stuff at work, to make sure I don’t get woken up in the middle of the night (or to ensure I definitely do, if something needs fixing). But these two things (my Twitch exporter, and my Emoji exporter) include monitoring of external APIs, and human nature.
This one in particular was fascinating. Because it was a relatively small dataset (the LeastUsedEmojiBot only has 14k followers on Twitter), I could clearly see cause and effect. For example, the spike in usage following the bot’s announcement that 🔠 was now the least used.
It was also interesting being able to make preditions, using Prometheus’ predict_linear()
function:
According to my grafana dashboard, we should be overtaking Truncated Latin Alphabet some point early tomorrow. 🚡 https://t.co/w63l6lRJDv
— Lucy Davinhart (@LucyDavinhart) 21 July 2018
Using Prometheus's built in predict_linear() function, the two emoji should be approximately equal in about 9 hours (5am for me in the UK). 🚡
— Lucy Davinhart (@LucyDavinhart) 21 July 2018
Specifically by 5amhttps://t.co/RvULAuVx7y
— Lucy Davinhart (@LucyDavinhart) 21 July 2018
I was wrong, of course. Human nature is not so easily predictable by simple linear regression.
But yes. This was fun. I need to do silly things like this more often.
I’ll leave you with this video, by a very inspiring woman:
And one more graph (click on it to go to the interactive version!):
]]>The vast majority of my collection I’ve purchased from GoDaddy. I’m fine with that. I often hear bad stuff about them, but they’ve been good to me so I’m in no rush to move.
But the DNS part… that’s spread out all over the place. Some of it is managed by GoDaddy, some of it is delegated to other hosting providers1. lmhd.me specifically is on PointDNS2.
So I’m going to move it all to one place as much as possible: AWS Route53
There are lots of DNS solutions out there: some free, some paid, some self-hosted. I could have gone with any of them.
But I wanted something I could automate, preferably with Terraform (because I’m familiar with that from work).
Terraform has many DNS providers to chose from, and I looked through quite a few of them.
But I eventually decided on Route53, partly because it’s cheap3, and partly because I’m thinking of moving some of my Heroku stuff over to AWS at some point.
ALIAS
records for my apex domainThe reason I went with PointDNS origiainally was because my apps were on Heroku, and it was available as a free add-on.
The original lmhd.me was a static page on HostGator, then I moved it to Heroku. Now it’s on GitHub Pages.
When it was on Heroku, I needed what PointDNS (and others) refer to as an ALIAS record. From the PointDNS console:
Please specify a DNS name to alias. Point will automatically duplicate A and AAAA records from this address at 15 minute intervals. This may be used as an alternative to a CNAME for the root of a domain.
In the case of lmhd.me this works like:
lmhd.me
is an ALIAS
record for lucymhdavies.github.io
lucymhdavies.github.io
is a CNAME
for github.map.fastly.net
github.map.fastly.net
has A
records for the relevant IP addressesA
records for lmhd.me
based on the github.map.fastly.net
A
recordsRoute 53 has no such concept. The only Alias records it has is for other AWS services.
I was originally thinking of just setting up A
records, as GitHub suggests, but I can do better than that.
Terraform can itself query DNS, e.g. CNAME records and A records.
So what I ended up doing was the following:
#
# lmhd.me apex domain --> github pages
#
# Lookup CNAME for lucymhdavies.github.io.
# Result is stored in the variable:
# data.dns_cname_record_set.lucymhdavies-github-io.cname
data "dns_cname_record_set" "lucymhdavies-github-io" {
host = "lucymhdavies.github.io"
}
# Get IPs for that CNAME
# Result is stored in the array:
# data.dns_a_record_set.github-io.addrs
data "dns_a_record_set" "github-io" {
host = "${data.dns_cname_record_set.lucymhdavies-github-io.cname}"
}
# Set up the A records for lmhd.me
resource "aws_route53_record" "lmhd-me-A-record" {
zone_id = "${aws_route53_zone.lmhd-me.zone_id}"
name = "lmhd.me"
type = "A"
ttl = "60"
records = [ "${data.dns_a_record_set.github-io.addrs}" ]
}
This provides me with a good middle ground between simply setting the A
records manually, and setting up something (a Docker container?) to replicate the functionality PointDNS provides.
I’m not done yet. I have more domains to move, but they’re all CNAME
and A
records, so nothing too interesting with that.
I just need to get around to doing it.
The only manual part of the process is updating GoDaddy’s nameservers for the domain to point to Route53.
There’s a Terraform plugin I could use for that, and there are other tools which use the GoDaddy API, for example a Node.js script
But given that it’s a one-off thing per domain, and I don’t have too many domains, I’m not too bothered right now.
Another option for me to consider is that domains from Amazon cost about the same as I’m paying from GoDaddy. When that domain is up for renewal in January, I might consider that.
At some point in future I’m going to have Terraform Plan run automatically, and send me a Slack notification if there are any changes to be applied4. But one step at a time, eh?
Hostgator, back in my “PHP is cool! Write everything in PHP” days ↩
Because that was available as a free add-on for Heroku ↩
Route 53 starts at $0.50 per hosted zone per month, which is reasonable enough. Route 53 Pricing
Apparently I have so far had 1.24 thousand requests in April (as of 1530 on 2nd April). I can’t seem to get historical data though, so I’ve extrapolated that to ~40k requests per month. Even then, it’s only $0.52 for lmhd.me per month, according to the AWS calculator
PointDNS free limited to one domain, and limited to 10 records per domain. Paid, costs $25/mo, for up to 10 zones. Or $2.50 per zone per month. PointDNS pricing
I didn’t really compare pricing for any of the others. AWS is cheap enough, and PointDNS isn’t. ↩
e.g. the A records for GitHub Pages change ↩