What's Lurking in Your Repositories?
The monolith is crumbling. Most organizations are moving from a small number of large applications to a large number of relatively small applications. Benefits include the ability to deploy small pieces of functionality without a risky big bang and the ability to experiment with new technologies with limited risk.
But every advance creates new problems. All those repositories are hard to keep track of. Most organizations lack understanding of what technologies they're using and how. This raises risk and cost. Without knowing what you have, it's hard to improve.
I'll show an open source tool to map many aspects of your repositories, including
npm and Maven dependencies, Docker images, git branching and CI config. You can extend it in TypeScript to comprehend anything in git. Actionable visualizations enable you to see current state and take control of technology drift.
Drift And Why It Matters
Technology drift is a form of technical debt that comes from progress. As our target architecture advances, it leaves a trail of projects in all its momentary incarnations. As repositories proliferate, the accumulation holds us back. We have many technologies and many versions of each. We have the risks of all of them.
When we go to work on a particular project, we have to bring it up to date before implementing features or fixing bugs. How much of this work is sitting undone across hundreds or thousands of repositories?
The longer we leave it, the harder this work becomes. Upgrades are like deployments. Continual small upgrades are smooth; infrequent large ones are painful and risky. Drift increases risk and hinders productivity. Besides the steady accumulation of debt, urgent problems include:
- Known security vulnerabilities in libraries, Docker containers, or API usage.
- Library versions that have reached end-of-life.
- Technologies we have chosen to migrate away from, such as a library we've found not fit for purpose, or a Docker image including things we don't want. For example, the
axiosHTTP library caused problems with customer proxies, and we're eliminating its use at Atomist.
As a CTO, architect or senior developer, we want this information readily available at all times. We need to quantify and locate the drift in our organization.
Investigating 1,000 Repositories
You can run this analysis on your own code, but to get a sense of the insights, let's look at some open source. Microsoft's
Azure-Samples organization on GitHub boasts over 1000 repositories using a range of technologies. (I've chosen Microsoft because they publish so much open source. Everyone suffers from these problems.)
Let's look at technology usage and drift, starting with those aspects with the greatest entropy: those with the messiest mix of versions. (2 variants in 20 projects would be low entropy; 17 high.)
Aspects analyzed included npm and Maven dependencies, Docker usage, license and code of conduct files, and .NET target framework usage. Those with the highest entropy are shown in this sunburst visualization, providing a guide to digging deeper:
Drilling into the TypeScript versions shows that no project is on the current version, and there are some very old versions.
We see particularly high drift with the
python Docker images:
We see everything from
latest of the
python Docker image, including some very specific releases:
These reports are revealing about the technologies in use. Surprises included PHP and Python 2.
Another aspect reveals the presence of a license file. 72 projects lack a license:
Insights Prompt Action
What we do with these insights depends on our organizational goals. This organization is a guide to developing apps to run on Azure. So it makes sense for there to be a wide range of technologies for illustration, but not for their usage to be inconsistent. We'd expect to see a range of Docker images, but not 11 variants of the
Priorities for this organization would include:
- Add licenses to all repositories
- Converge TypeScript and Spring Boot usage onto the current version
- Rationalize Docker images
- Archive dead repositories, identified by an aspect that captures the recency of git activity.
As we undertake such work, we can rerun the analysis and observe our progress, helping to ensure we don't slip backwards. (With the Atomist service, we could automate much of the work.)
How It Works
Atomist aspects capture parts of code, configuration or process, capturing them in a canonical fingerprint which can be compared.
Fingerprints are extracted from repositories in TypeScript code, using the Atomist API for software. Here's the code that looks for a license file. The
extract method is run on every repository:
Aspects are unit testable, so you can follow good development practice and have confidence in them before running them on your code.
The out-of-the-box aspects are just the start. Customer uses include:
- Checking the presence and correctness of a security manifest file
- Checking for a required Spring Boot starter in every Spring Boot web project
- Checking for exposed secrets
Mapping Your Own Organization
Investigate your own organization today. Go to the https://github.com/atomist/org-visualizer repository on GitHub and follow the README to work with GitHub or local repositories. What questions would you ask to help you understand current state and improve it?