How many different Docker base images is your organization running? Which ports do your Docker containers expose? How many versions of core technologies like TypeScript or Spring Boot are you running? How many versions of any particular library?

Chances are, the answers are going to surprise you. Inconsistency brings risk and slows progress. Different technologies and versions have different bugs, capabilities, security vulnerabilities and implications for your developers. Inconsistency makes it hard to move forward quickly.

The move toward cloud native applications has brought fragmentation that can approach chaos. We no longer have a small number of very large repos, but a very large number of small repos. In previous posts I’ve discussed the risks of fragmentation in CI files; today I’ll discuss several other kinds of fragmentation, and show how Atomist can help you better understand your organization today.

The Problem: Inconsistency and Drift

Consider TypeScript version. One of the awesome things about TypeScript is how fast it moves. It’s good to be able to use those cool new features consistently. But occasionally we do encounter breaking changes, and the longer repos remain un-updated, the harder they become to fix.

Atomist’s client framework and supporting libraries are written in TypeScript. Here’s an interactive visualization of Atomist’s open source code on GitHub, breaking down the TypeScript versions:

          (This is an interactive chart, start by clicking on the atomist ring)

This sunburst is rendered using d3. The innermost ring is the TypeScript version specified in package-json. The outermost ring shows individual repos. Clicking on the intermediate rings makes their names visible and provides clickable links.

Most repos are fairly up to date, but some  need attention. Those with the oldest versions may be dead--itself useful, actionable information.

The story is similar for versions of any library. Consider Atomist’s microgrammar parsing library, used by a subset of projects. The innermost ring is the organization, as this screenshot shows data from private repositories at atomisthq as well as open source at atomist. The next is the version of the library specified in package.json. The next is the version that was resolved, according to package-lock.json, which may not be the same. (For example, ^0.6.0 resolved to 0.6.1 in one repo.) Again, the outermost ring shows individual repos.

It would be good to get up to date, and especially to move off milestone versions. Again, it seems that there may be dead repos that should be archived.

Let’s dig deeper down the stack. Consider Docker base images, again across both Atomist organizations:

This visualization is surprising and demands further investigation. We need multiple base images for different types of projects, but why multiple versions of any one base image? Why 3.3, 3.4 and latest of alpine? Why 2 versions of java and 7 of node?

Let's consider the spread of exposed Docker ports, which may also have security implications:

No great surprises or obvious action items here. But as well as surfacing potential risks, this visualization helps understand what each repository does, and how many there are of each type. For example, those exposing 2866 are Atomist SDMs.

These visualizations provide useful, actionable information about Atomist’s own code. What could you learn about your organization?

Analyzing Your Own Organization

We’ll be building visualization functionality into the Atomist service. However, you can get insights into your own code right now using open source by following these steps:

  1. If you don’t already have the Atomist CLI, install it via npm i --global @atomist/cli.
  2. Clone
  3. In its root directory, run npm i
  4. Run npm run build
  5. Run npm link to install the spider binary that can analyze GitHub organizations.
  6. (Optional) If you need to work with private repos, ensure that your GitHub token is available to the Node processes via a GITHUB_TOKEN environment variable.
  7. Run spider <org> to analyze a GitHub organizations. To see the open source part of the data shown in this post, type spider atomist.
  8. Start the server via atomist start --local and navigate to http://localhost:2866 too see the visualizations.

Spidering currently only supports We’ll soon add support for GitHub Enterprise and BitBucket. Contact us if you have an urgent need.

Surface What You Most Care About

These specific reports are valuable, but the true power of this approach is in its extensibility. While there are existing point solutions for handling certain aspects of projects, the open source Atomist analysis framework allows you to analyze and expose aspects of your choice, enabling you to better understand what matters in your organization, even if it’s unique to your needs.

There are three layers in org-visualizer, all extensible and customizable:

  1. The analysis framework (from @atomist/sdm-pack-analysis), enabling scanners that extract relevant data from projects. Here's a real scanner that extracts Node data.
  2. A query layer that can create JSON structures from persisted analyses and expose them through Express routes. This layer provides support for grouping, splitting and rendering data, to any number of levels. Here's the TypeScript version query.
  3. A simple UI using Handlebars and d3 for client side rendering. JSON data is also exposed over HTTP.

Anything in any of your repos is accessible to Atomist analysis. Suppose you want to check if all your repos have a code of conduct--an important consideration in open source organizations. Add the following scanner. Its implementation benefits from the simple, testable, Project abstraction that Atomist provides:

After analyzing with this scanner enabled, you can visualize the new path within the analysis model by visiting http://localhost:2866/query/path? Code of Conduct:

Whoops, action required here. We still have a few repos within the Atomist open source org that don’t have a code of conduct--something we’ll fix quickly, now we’re aware of it.

What data could you extract and query to deliver insights concerning your unique needs? What generic data could you extract and query to help the community? Pull requests welcome!

Atomist: An Enabler for the Cloud Native Era

Gathering and exposing this information is possible because, unique among software delivery technologies, Atomist understands your repositories. This understanding allows you to specify consistent delivery policy across your entire organization, and also makes Atomist a great basis for reporting.

Try the org-visualizer today on your own to learn more about your own repositories. Join the Atomist community Slack and tell us if there are particular insights you'd like. Learn more about Atomist to see how it can help with development and delivery at scale.

In my next post, I’ll look at how Atomist can also help to start rectifying problems identified this way.