When YAML Breeds: Distributed Denial of Productivity
My last post discussed the danger of programming in YAML, with examples from the CI domain, where YAML is the norm for defining delivery pipelines. Worse still is YAML programming at scale. As services multiply, so do the YAML files, and so does duplication and drift. This holds us back.
Proliferation, Duplication and Drift
Consider Microsoft's GitHub organization. To ensure an apples to apples comparison, I focused on one CI tool (Travis) and one stack (Node).
Out of 90 repositories, there 40 different Travis script
stanzas. Some lint, some don't. Some don't even use npm
. 35 are one offs, some with elaborate script blocks. There are 20 different beforeInstall
scripts and 6 different afterSuccess
scripts, some reporting test coverage one way, others another way, most not at all.
Both the inconsistency and duplication are problematic. Why perform the same logical steps differently or in a different order? Why say something more than once? If this were application code, we'd consider this a smell.
There's no sense of organizational best practice. We either care about linting and test coverage or we don't. Unlike, say, dependencies expressed in a package.json
file, such concerns are unlikely to vary by repository.
Delivery across this organization fails a basic test. It would be hard to add or change functionality. For example, the realistic requirement to "add test coverage reporting to all repos" would require over 80 repos to be modified. Adding CVE scanning to all repos would touch every one, and the changes required would differ on a repo by repo basis. Such important concerns affect all repositories and each policy should be updatable in one place.
The technical solution does not match the problem. Modeling the delivery of Microsoft's Node projects in 90 distinct YAML files does not match the requirements.
Wait: What Are We Trying To Accomplish?
Let's step back and consider what we're trying to do in delivery at scale.
A problem statement might be something like We have many projects and need to deliver them safely, where "safely" means built successfully, with no known CVEs, meeting organizational standards for code quality and formatting, and passing all tests. The definition of "safely" changes over time. We may discover additional checks we need to run, additional teams or systems we need to notify of progress, or ways to optimize or correct invocation of compilers and other tools.
The problem statement would certainly not be We have many repositories, each of which should have a distinct build pipeline. Yet this is what we do by default today. When behavior is scattered across hundreds of repositories, it's impractical to evolve it. We get inconsistency and errors we can't easily fix.
A model of one pipeline definition per repository doesn't reflect the real problem. We need delivery policies at team level, not repository level.
The traditional default of defining behaviors at repository level with clumsy ways of sharing common steps is the wrong way around. We should share behaviors by default, with the ability to specialize by repository or logical group where needed. Delivery actions are cross-cutting concerns.
Escaping One Pipeline per Repository
CI was a big step forward for our industry. CI files were once our friends. When we had a small number of large projects, one pipeline per repository worked fine. Now we have many smaller projects, we need to think at organizational level.
To achieve organizational policy we need more context. In particular:
- Delivery needs to be smarter. We need a domain model to help us work with projects and their delivery, grouping behavior as necessary. For example, we should be able to treat Node projects differently from Java projects, and TypeScript projects differently from JavaScript projects.
- We need a richer concept of stages. In place of a fill-in-the-blanks model such Travis's (with hooks like
script
,beforeInstall
andafterSuccess
), we need to model meaningful phases such as lint/fix, build, test, and deploy, and allow meaningful custom stages to be defined. - We need a more sophisticated way of expressing behavior. Logic belongs in a programming language, not YAML lists of scripts.
We need to rethink the notion of the static pipeline defined upfront. In my next post I'll show how an event-driven approach backed by a rich domain model is more powerful and flexible, avoids duplication and drift and enables us to evolve our delivery in real time. We can apply modern software engineering principles to make our delivery solutions match modern requirements.
Thanks to Chris Swan for the phrase "distributed denial of productivity."