Constructing a DNS-01 Challenge with ACME and Python

Domains    DNS    TLS    Python    2023-07-26

The company I work for publishes content at hundreds of domains. Records for these domains weren't always managed by just one team, so as the company has grown, things like DNS and contacts were scattered across multiple registrars.

As one part of our effort to consolidate, I'm currently working on a tool that will help manage TLS certificates (a type of file that you must have in order to enable HTTPS on your website).

Until recently, a lot of our certs were on SAN certificates (a Subject Alternate Name, or SAN, is a type of cert which allows multiple hostnames to be protected by a single certificate). They saved money in many cases, but became risky and expensive to update when any of the included domains changed ownership. More specifically, rekeying SAN certificates to add or remove domains can potentially cause TLS to stop working for all the domains on a cert.

Our solution is to use this new tool to generate certificates with Let's Encrypt. Because Let's Encrypt certs are free, it becomes reasonable to get out from under SANs and issue a separate cert for each domain. The downside is that Let's Encrypt certs, by default, expire within three months, so we need a system that will:

  • check TLS expiration dates on a regular basis
  • renew them when appropriate
  • upload the new certs and keys to our CDN

In order to issue a certificate, a CA (or Certificate Authority) such as Let's Encrypt must first verify that you control the domain that you're trying to get a certificate for.

For individual certificates, it probably makes sense in most cases to use something called HTTP01 validation - that is, to add a snippet of code to your web site. The CA can then reach out with a simple http request and verify that the code is there, thereby confirming that you in fact control the site you're requesting a certificate for. HTTP01 methods are very well-documented.

But we're working with certificates and domains in bulk, and for our purposes it's far faster and more efficient to use the DNS01 type of validation. That is, we place a TXT record in the domain's DNS, with a specific name and value provided by Let's Encrypt. Let's Encrypt then checks the domain's DNS for the presence of the TXT record in order to confirm ownership.

As I was building this tool, I found that there are lots of examples of HTTP01 challenge code out there, but DNS01 is not as well-documented. Hence this post.

I want to point out that we're using a couple of specific tools and providers that make all this possible, so ymmv:

All of our DNS records live in hosted zones in AWS Route53. The AWS boto3 library makes it simple to add and change DNS records programmatically, but AWS isn't the only DNS host that has an API. We also have a custom-built database that contains domain names with their Route 53 zone ids, plus some other important metadata.

Now let's take a look at the code we're using to connect to Let's Encrypt and AWS to go through this DNS01 validation process. The Python standard packages you'll need:


from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa
import datetime
import json
from OpenSSL import crypto
from OpenSSL.SSL import FILETYPE_PEM
import os
import sys
import time

cryptography will be used for registering a new account with Let's Encrypt, and OpenSSL will be used for making certificate signing requests, or CSRs.

Third-party libraries:


from acme import challenges, client, crypto_util
from acme import errors, messages
from acme.client import ClientNetwork, ClientV2
import boto3
import josepy

If you've generated Let's Encrypt certificates before, you may be familiar with the EFF's command line tool, certbot. This acme package contains methods that let you work with the ACME protocol programmatically. The code is here, inside the certbot source code.

That repo has an "examples" path that contains only one piece of example code, for the HTTP-01 challenge. A lot of the code in that example is useful for constructing a DNS01 challenge, but there are some details missing.

Some default values we're setting:


DIRECTORY_URL = 'https://acme-staging-v02.api.letsencrypt.org/directory'
USER_AGENT = 'python-acme-example'

DIRECTORY_URL is the ACME URL for Let's Encrypt's staging environment. (Read more about their testing environment here.)

USER_AGENT is an arbitrary value - it can be any string, it's just used for generating an account with Let's Encrypt.

The first step is creating that account:


def register_account():
    """ standalone method to register a new LE account """

    # Create a new account key
    print("Generating a user key")
    user_key = josepy.JWKRSA(key=rsa.generate_private_key(public_exponent=65537, key_size=2048, backend=default_backend()))

    # Register the account and accept TOS
    net = ClientNetwork(user_key, user_agent=USER_AGENT)
    directory = ClientV2.get_directory(DIRECTORY_URL, net)
    acme_client = ClientV2(directory, net=net)

    # Terms of Service URL is in acme_client.directory.meta.terms_of_service
    # Creates account with contact information.
    email = ('your-email-address@example.com')
    account_resource = acme_client.new_account(messages.NewRegistration.from_data(email=email, terms_of_service_agreed=True))

    return account_resource

You only need to create the account once, so once you have it, be sure and store the account_resource object in some kind of secrets vault so that you can access it again.

Next comes the actual certificate request. In my version, I'm passing in a dict containing the name of the domain, the AWS Rt53 zone id, and the account resource.


def request_cert(**kwargs):
    """ returns either [] of AWS secrets arns or an error string """
    domain = kwargs['domain'].lower()
    hostedzone = kwargs['zone']
    account_key = kwargs['account_resource']

Generate a user key based on the account resource and instantiate the acme client:


    user_key = josepy.JWKRSA.fields_from_json(account_key)
    network = ClientNetwork(user_key)
    directory = messages.Directory.from_json(network.get(DIRECTORY_URL).json())
    acme_client = ClientV2(directory, network)
    reg = messages.NewRegistration(key=user_key.public_key(), only_return_existing=True)
    response = acme_client._post(directory['newAccount'], reg)
    regr = acme_client._regr_from_response(response)
    account = acme_client.query_registration(regr)

Create a certificate signing request (that will include a private key) for the domain:


    pkey_pem, csr_pem = new_csr(domain)

Use the CSR to generate a certificate. The first step in that process is requesting the order - this returns an object that contains several types of challenges, including HTTP01 and DNS01:


    order_object = acme_client.new_order(csr_pem)

You can see my code for get_dns_challenge() at the bottom of this post - it's a simple method that extracts the DNS challenge from a collection of challenges in the order object:


    dns_challenge_object = get_dns_challenge(order_object)

Then the validation process kicks off - from response_and_validation(), the validation object is the converted token that must be written to your domain's DNS records:


    response, validation = dns_challenge_object.response_and_validation(acme_client.net.key)

My code for updating the DNS record is also below - this is specific to AWS and the boto3 library:


    ready_to_validate = update_dns(validation, domain, hostedzone)

I'll show it in detail at the end of this post, but in a nutshell, it creates a TXT record named _acme-challenge.{domain}. That record uses the validation token as the value, then sleeps for a couple of minutes to give the new record plenty of time to propogate.

Once we're sure the DNS record is in place, we can ping Let's Encrypt again and let them know it's time to attempt authorization:


    fullchain_pem = ''
    if ready_to_validate:
        challenge_resource = acme_client.answer_challenge(dns_challenge_object, response)

        deadline = datetime.datetime.now() + datetime.timedelta(seconds=180)
        try:
            finalized_order = acme_client.poll_and_finalize(order_object, deadline)
            fullchain_pem = finalized_order.fullchain_pem
        except errors.ValidationError as e:
            print(f'Validation error on {domain}: {e.failed_authzrs}')

A couple of things to note:

The deadline value passed to poll_and_finalize() is optional - that basically just sets a timeout so that we're not waiting too long for Let's Encrypt to respond.

Also, the finalized_order that's returned by poll_and_finalize() contains both a fullchain.pem and a cert.pem. For our purposes, we're storing the fullchain.pem - a combination of cert.pem (the "end-entity certificate") and chain.pem (the intermediate certificate chain) in a single file. Your TLS configuration may require something different, just know that those options are available. As you're testing, I recommend exploring the contents of the finalized_order object to see what it contains.

Finally, we're doing another AWS-specific task - storing some objects (the csr_pem and private key pem from the original request, plus the fullchain_pem) in Secrets Manager:


    arns = []
    try:
        pem_list = [
            {'key': 'csr', 'value': csr_pem},
            {'key': 'private_key', 'value': pkey_pem},
            {'key': 'fullchain', 'value': fullchain_pem}
        ]
        arns = store_pems(domain=domain, pems=pem_list)
    except Exception as e:
        print(f"Error storing certificate: {e}")

I'm not posting the code for our store_pems() method here. We're just using create_secret and update_secret from the secretsmanager class in boto3, all well-documented here.

And here are the other utility methods mentioned above - this one parses the order object to extract metadata for a DNS01 challenge:


def get_dns_challenge(order_object):
    """Extract the DNS challenge from a collection of challenges"""
    # This object holds the offered challenges by the server and their status.
    authz_list = order_object.authorizations 
    for authz in authz_list:
        for i in authz.body.challenges:
            if isinstance(i.chall, challenges.DNS01):
                return i
    print('DNS-01 challenge was not offered by the Certificate Authority server.')
    return False 

This next one is specific to AWS - using the route53 class in boto3 to add a TXT record to the domain's DNS. The route53 methods are pretty well-documented, but I still found the call to change_resource_record_sets() a little tricky to construct, so I'm including what I did here:


def update_dns(token, domain, hostedzone):
    """Add the challenge TXT record to DNS"""
    
    awsclient = boto3.client('route53',
       aws_access_key_id={your access key},
       aws_secret_access_key={your secret key})
    
    recordset = {
        'Name': f'_acme-challenge.{domain}.',
        'Type': 'TXT',
        'ResourceRecords': [{"Value": f'"{token}"'}],
        'TTL': 60
    }
    changeset = {'Changes': [{'Action': 'UPSERT', 'ResourceRecordSet': recordset}]}
    try:
        response = awsclient.change_resource_record_sets(
            HostedZoneId = hostedzone,
            ChangeBatch = changeset
        )
        change_id = response['ChangeInfo']['Id']
    except Exception as e:
        error = f"Error updating DNS: {e}"
        return error

    while True:
        time.sleep(5)
        response = awsclient.get_change(Id=change_id)
        status = response['ChangeInfo']['Status']
        print("DNS change status:", status)
        if status == 'INSYNC':
            time.sleep(120)
            break
    return True

Something we could probably add here is a check using the dnspython library to verify that the TXT record is returning before continuing.

And that's it! If there's any interest, I may also go through what we're doing with the Fastly API as a part of this TLS tool, but in the meantime I hope this helps anyone struggling to work with DNS challenges for certificate requests.