Leakage of secrets such as API keys in GitHub repositories is dangerous. How dangerous? It once caused Uber to leak the contact details of 75m users. Bots are crawling all over GitHub seeking secret keys, a developer served with a $2,375 Bitcoin mining bill found.
Adrian Colyer drew my attention to a paper that shows that such secret leakage is widespread, finds that various existing solutions are ineffective, and suggests a set of regular expressions that the authors found to be more dependable to handle common secret types. See Adrian’s summary in the morning paper.
With a typical delivery stack, there isn't a ready solution. Detecting secret leakage is an example of an important requirement that should be addressed at organizational level. Such crosscutting concerns are poorly modeled by the repo-by-repo level approach used by typical CI tools.
In this post I’ll show an approach to this problem that you can apply to all your repos today.
Ideally we’d prevent secrets being pushed to GitHub by flagging them on local commits. However, this isn’t sufficient to secure an organization, as different developers may have different tool chains.
For a comprehensive solution, we need three checks:
- Local: A check on each local commit to identify potential leaks before a push
- Organizational: A check on each push to the GitHub organization
- Scheduled: A check that could be requested against a local or remote project independent of commit activity
Think of how we could approach this with typical tools:
- For the local check we could invoke a script from a git
pre-commithook. We'd need to add such a hook to all our repositories.
- For the organizational check, we could invoke that script in every CI file, probably by adding to a YAML list. This would require a change to every repository, and that every repository had CI. Even in a well-run team, there may be repositories of documentation etc. that don’t have CI. Some SCM servers, such as GitHub Enterprise, support
pre-receivehooks, which are a better alternative.
- For the “scheduled” use case, we could run the script against each repository. Without a way to run such scripts across our GitHub organization, we’d need to clone repos and issue commands manually.
The model of many distinct CI pipelines is a poor fit, leading to the practical problem of having to update (and maintain!) distinct pipelines in many repositories. Another practical problem is the lack of a model that’s portable between development machine and GitHub or other SCM hosting platform.
An event-based approach is a better fit for this type of requirement. What if we could add an event handler on all pushes that runs the necessary checks?
Let’s try this approach using Atomist, which solves both of the practical problems.
An Atomist Software Delivery Machine (SDM) enables us to respond to events such as pushes across an entire organization, enabling team level policy instead of isolated, inconsistent per-repo behavior. We define the behavior in toolable, testable TypeScript, backed by a model we can also use to automate many tasks besides delivery.
The following code enables an SDM to check for exposed secrets on every push across an organization:
The implementation of
sniffForSecretsOnPush uses the Atomist Project API, which abstracts from git and the file system for portability and to facilitate testing. We apply the regular expressions from the paper to check each file. Unit tests help to avoid surprises once we unleash it on real projects.
Once the SDM is running, we’ll see notifications like the following if any secrets are exposed:
We can invoke the same sniffing code to add a command handler to our SDM, meeting the third requirement of a scheduled scan. We will be able to invoke the command from the CLI, Atomist web interface or Slack.
Run It Yourself
You can get this running yourself in a few minutes.
- Install or update the Atomist CLI via
npm i -g @atomist/cli. (You’ll need Node.)
- Clone the repo at https://github.com/atomist-blogs/secret-beagle.
Run it on your local commits
Change into directory into which you cloned the repo, and start the SDM via
atomist start --local.
In another window, type
atomist feed to see messages from the Atomist process.
Check out repositories via
atomist clone <url> to have Atomist track changes to them. (It installs git hooks under the covers in local mode.) Try committing a change containing something that looks like a secret, like
AKIAIMW6ASF43DFX57X9 from the paper, and look in the feed window.
Run it for your organization’s commits
If you have already created an Atomist workspace, remove the local flag shown above and run
atomist start. This will enable the secret-beagle SDM on all repos connected to your Atomist workspace, running the same code unchanged.
If you don’t already have an Atomist workspace, you can create one with the atomist CLI by typing:
atomist configwhich will take you through GitHub authorization for the Atomist application
atomist workspace create
Try adding a similar pseudo-secret to one of the repositories in your GitHub organization. (Do not use a real secret!) You will see a notification in Slack.
You can host an SDM wherever you like. Many users use Kubernetes.
You can check your repositories both locally and on your SCM server by running the same codebase in both modes.
The Atomist implementation is not merely portable between GitHub and your local machine; it also works on GitHub Enterprise, BitBucket Server and Cloud and GitLab.
Run as a command
@atomist release the hound in either your command line in the directory of an Atomist-managed project, or in a Slack channel linked to a repository. All files in the repository will be scanned.
Customizing and extending your SDM
To configure the secrets or whitelist of non-secrets, edit the
secrets.yml file in the root directory and restart your SDM. (Yes! Valid use of YAML!)
To modify the code, start with the
lib/machine.ts file. With the power of TypeScript and
npm modules, you can enhance it to meet your needs: for example, to use Twilio to make more aggressive notifications, or to make the local SDM throw up a toast on your OS.
Reacting to Push Events
It was easy to add this check across all repositories because an event-based model is superior to multiple pipelines for many important use cases.
What else could you do with an event-based approach to delivery? There are many other important security use cases, such as vulnerability scanning. Or code quality scanning with something like SonarQube. Or you could orchestrate your entire CI/CD flow using this approach, moving CI/CD from a tactical per-repository exercise to a strategic per-team level.
To see the implementation process (and the fact that the SDM did indeed work right away once unleashed on real repositories!) see my pairing session with @jessitron in which we implemented the core functionality. Join the Atomist community Slack if you have questions. See you there!