Terraform Next.js: The road to Atomic Deployments

When the first version of our Next.js for AWS module was released, it was more like an experiment rather than production grade software. Along the way we learned a lot about Terraform, the serverless community and developer experience, which greatly improved the project so that we reached over 850+ stars on GitHub in the meantime.

At this point I want to say a big thank you to all of our users, issue reporters and supporters out there that helped us to achieve this!

While working through bug reports and questions from you in the last few months there was always the feeling of a missing big picture when answering them. So in this blog post I want to give you a deeper insight on what we are working on today and how the module will work in the future.

The best of both worlds: Separation of responsibilities

When comparing milliVolt infrastructure to other providers like Netlify, Vercel or AWS Amplify you may noticed that the way how deployments are made is very different. Instead of running a simple command, you have to build your code first and then run Terraform commands which finally brings your Next.js app to AWS. This felt a bit cumbersome since you always need to access the whole infrastructure in your AWS account when deploying an application.

For individual developers who are experienced in a variety of skills this works fine, but imagine you work in a larger team that also has an IT department. The fine folks from there always try to limit the access to resources to protect ourself from making unintentional changes that may impact the whole system. Running the terraform apply command could have such an impact, because depending on your setting not only the Next.js app you want to deploy is embedded into the configuration but other things like EC2 instances or VPCs, etc. also.

Here at milliVolt we are always aiming for enterprise ready solutions, so we focused on the question: How can we provide a solution that makes IT and developer people both happy?

The answer to this question is a stricter separation of responsibilities:

The Terraform module itself should become a one-time setup that can be installed and updated independently from the Next.js app.
The Command Line Interface (CLI) tf-next should become the only tool to build, deploy and manage the whole deployment.

Atomic Deployments

While this alone is a great improvement for our enterprise users we thought bigger: What can we do to improve the developer workflow further? I think we have found a pretty handy solution for it that we call Atomic Deployments.

Since our module is built fully on serverless technologies, a single deployment costs you nothing (except a small amount for the storage) when it is not used. This means having a single deployment costs you effectively as much as having a few hundred (or thousands) of it. With the previous version of the module it was only possible to deploy a single Next.js application with a Terraform template. That means if you need different stages like development, testing and production you would have to duplicate all of the services and resources, since Terraform is not well suited to handle a dynamic number of resources.

New architecure of the Terraform Next.js module for AWS

Here the effort that we put into the separation of Terraform and the CLI pays off: Because an individual deployment is no longer managed by Terraform but the module itself, we could simply extend its functionality to support multiple deployments at the same time.

We achieved this through a combination of Terraform, AWS Cloud Development Kit (CDK) and CloudFormation. When deploying the module to your AWS account with Terraform, it deploys a new Lambda powered service, that is used by the tf-next CLI to create or delete deployments. The service internally creates a new stack with CDK for your deployment and then translates it into a CloudFormation template. All Lambda functions that are created for your Next.js deployment to enable server side rendering (SSR) are then bundled into a single CloudFormation stack.

We call this atomic deployments since each deployment gets its own CloudFormation stack and deleting one deployment does not affect other deployments. While with this solution we are now able to host multiple versions of our Next.js app (or even different Next.js apps) on our AWS account, how can a client access them from the browser?

Since the first version of our module we use Lambda@Edge to handle cache misses from CloudFront. This way we are able to detect early in the request flow which routes are static and should be served from S3 and which need to be rendered by a Lambda function. We extended this service (Internally called proxy) with the introduction of aliases. An alias simply represents a mapping of a domain (e.g. example.com) that links to a certain deployment. You can think of it as a routing table that is used to resolve IP addresses for a given domain name.

So when a request hits your CloudFront distribution it now looks up which deployment is associated with the domain before handling the request itself. This has the advantage that deployments and aliases are highly independent. So you can first deploy a new version of your application, test it, and then change to alias to the new deployment. All of this without a redeployment of CloudFront because we use a smart caching system that is able to redirect the traffic to the new deployment in less than 60 seconds.

Transition phase

We are pretty excited about these new features we want give them into your hands as soon as possible. Because of this we started a transition phase where we provide canary releases that include these new features (v1.0.0-canary.x) while still providing support for the former build and deploy workflow that is handled by Terraform (v0.x).

So we recommend using the v0.x for production work and if you want to test the latest features, pick the latest v1.0.0-canary release. While most of the described features above is already implemented, there are still things that are supported in version v0.x but not yet implemented in v1.0.0:

Environment variables for Lambda functions
Next.js image optimization
Attach VPC to Lambda functions
CloudFront invalidations

Once these are resolved, the version 1.0.0 of our module becomes generally available. We expect this to happen by the end of this summer.

Until the wait is over, we are happy for every feedback (or bug report) on our canary release, so don't hesitate to try it out! Setup and install instructions are available at the GitHub repository.