Is Infrastructure as Code suffering from Stockholm Syndrome?
6 min read, last updated on 2020-12-22
This question popped into my mind recently. After several sessions with the AWS CDK, when I worked through typical tasks that we all do as a Cloud Architect / DevOps Engineers on AWS, I realized that in the Infrastructure as Code space, we did not arrive at any breakthrough recently. Does it mean that we are in the right place? Is it the global optimum?
I am working with those topics since ~2011 when I started with Chef, then a lot of Ansible, some Puppet. Nowadays, a lot of CloudFormation and Terraform. I had seen the rise of Immutable Infrastructure when there was no Docker nor Kubernetes. I also wrote other related articles here - either about specific tools or seeking correctness with the help of formal methods. Gosh, I even did a talk which you can summarize as an ode to AWS CDK, and publicly called it a revolution (talk is in Polish, but it has English subtitles):
BTW. Do I still believe in that? Yes, but actually no - keep reading.
Nevertheless, I work with “infrastructure automation” for almost 9 years, I have seen things. However, this thought was novel to me. Are we at the peak of possibilities in the Infrastructure as Code?
Is it doubt, more profound thought, or just frustration?
Probably all of that. I can see the real problems of IaC, but I cannot see what the future should look like to improve it. I think that is why I had such high hopes about Pulumi or AWS CDK because it was one of the first fresh trials to break the status quo. I remember days before Terraform when Troposphere was a thing. Yet, it did not stick, even though it has many merits.
We take the current state of infrastructure as code for granted, and I think we do not realize how complicated and messy it is.
Every single time when a typical developer jumps into our DevOps pit (I call it DevOps because I lack a better word for it), I hear a ton of complaints about dealing with the ton of YAML files or pseudo programming languages. There is no way to achieve proper testability, modularity, and sensible reusability with our toolchain for those people. We have some aces in our sleeves, and we got used to it, and this is how we roll. But you have to admit that those voices are partially right. Isn’t that a reason that we deal with such a massive migration between 0.11 to 0.12 just to introduce loops? I know it’s a slightly snarky comment, but it has more than just a grain of truth.
So maybe it has to be like that?
I wouldn’t say that mostly because it sounds like surrendering without trying. That is why some people (me included) so warmly welcomed new kids on the block like Pulumi, Architect, or the AWS CDK mentioned above.
When I am in doubt, I usually try to research and widen my horizons. I did it, and I found out this fantastic talk by Ben Kehoe:
I love it! Mostly for that, it tackles my doubts from a different angle. First, it sets an excellent point about treating YAML as a syntax. Secondly, Ben is entirely about right some objections regarding AWS CDK, and I can share those feelings after working with this tool for several weeks straight.
Let’s go one by one, then: I fully agree that analogy to the assembly language will cause more harm than good, even if it looks right on the surface. Again, he made an excellent point about translating output back to the abstraction layer and about that CloudFormation should evolve to a better state and not become a crude intermediate layer. I also share his objection, which is very subtle - those tools advertise a unified programming language for infrastructure and business code. However, the value of individual lines should not be equal. Ergo: it is way too easy to create infrastructure so that we may do it mindlessly. I love that he did not make a point about imperative vs. declarative, as we all know that you can write imperative code in a declarative way. However, he did a great observation that cloud infrastructure is a bounded domain to benefit from a proper DSL. Why I say proper? Because it should reflect the resources graph, their lifecycle, and change management around. And Terraform or CloudFormation are not exactly there yet (although the latter is much closer IMHO).
There are better tools on the horizon!
Why we need more? Because of two reasons.
The first reason is that infrastructure automation is a crucial element for every company with more than a handful of machines, either on-premise or in the cloud. Those definitions are becoming more and more complicated. That is why we require testability, declarative abstractions, safe change management, ability to modularize and reuse our code. Moreover, writing CloudFormation by hand becomes a technique that is considered harmful in the long run.
The second thing is communication. Take a look at the software architecture space. UML is no longer a king. Due to the Agile revolution, more lightweight approaches to documenting and reasoning about architecture are used more frequently. We have C4, ADR, etc. With every iteration, we are getting closer and closer to the Architecture as Code. We need similar things for communicating IaC to the other stakeholders (yes, developers, and other tech team members are also stakeholders in this case). All of those efforts should get us as close as possible to the concept that Seth Vargo describes as Everything as Code:
Of course, there are already tries to improve the current state. Let’s start by adding an even more sophisticated type system into AWS CDK, which punchcard tries to do. Still, it is not a proper domain language.
I want to point a great example in the DSL space (unfortunately, it looks like a dead-end after the company pivoted) - it’s called Ludwig from Fugue Inc. It had solid foundations (assumptions listed in a post above) and could evolve in the direction of a very nice replacement for writing AWS CloudFormation by hand.
There is also a different class of solutions - those who try to solve a subset of the problem space like Habitat, Serverless Framework Components, AWS SAM, or Serverless Application Repository. And that is great - we need movement in this space, yet we are still far from ideal.
However, there is hope - going further, I found Ben’s article about InGraph, and I looked even further. And I must admit that I like what I see so far - but as I am starting with this tool, that is the last thing I will say about it - at least for now. 😉
Summary
Considering what I have found during my journey to answer the title, I think we are not at the end of the road. You can see that many people actively think and work on improvements, new tools, ideas.
I think the Stockholm Syndrome will appear in the mind of a person stuck in the status quo and just peeked for a moment from that pit and saw the first thing that makes the most significant noise (e.g., Pulumi or AWS CDK). However, the most immense fun in our industry is that you can always find another solution to your problem.