Finding Secrets in Git Repos with TruffleHog


Version control systems such as Git make it easy for developers to work collaboratively to create software. By storing a history of who made what changes when, they make it easy to combine edits made by different people without losing either person’s work. The stored history also makes it much easier to manage separate versions of the same project: after fixing a bug in one version, the same changes can be applied to other versions with a simple command. However, all this stored history has a downside - it’s all too easy for developers to commit information to the repository that shouldn’t be visible to everyone who has access to the source. This could be configuration files containing database passwords, deploy scripts including server credentials, or even the private key files for SSH or HTTPS. Removing the secret data from the current version doesn’t help, because the previous version is stored in the history and is still accessible. TruffleHog is one tool which makes it easier to search through the history of a git repository to discover passwords and other secrets.

Installation

TruffleHog is a Python project and can be installed with pip:

pip install trufflehog

The development version is available on GitHub at dxa4481/truffleHog.

Usage

To run the default scan, which searches for “high-entropy” strings, use:

trufflehog {git_repo_path_or_url}

If you pass the URL of a remote repository, TruffleHog will clone a temporary local copy and scan that.

The output consists of many messages looking like this:

~~~~~~~~~~~~~~~~~~~~~
Reason: High Entropy
Date: 2017-03-14 21:28:54
Hash: 3b85e4e386f1b616207f51d697dc401ac166ae4e
Filepath: config.ini
Branch: origin/master
Commit: Set up database

In case we have some alive processes this will hang, which is
not that we expect to get.

@@ -50,1 +50,1 @@
 db_user = admin

-db_pass = 
+db_pass = NisbejLogFoncibtysortEmeyxTox8
 
 db_name = myproj

~~~~~~~~~~~~~~~~~~~~~

One disadvantage of the “entropy” method is that it often finds too many false positives: random-looking text which is not secret. Examples of this are checksums (including the commit hashes used to refer to git submodules or the file hashes used to store large files in git-lfs and git-annex) and public keys. There is an alternative search mode which uses regular expressions instead of the entropy heuristic. To use this, run:

trufflehog --entropy=NO --regex {git_repo_path_or_url}

The output from this mode will contain messages like these:

~~~~~~~~~~~~~~~~~~~~~
Reason: Password in URL
Date: 2018-08-04 07:48:37
Hash: 0c32b0acc95b032ddc6e3b904c39c982e29590c6
Filepath: tests/test_http.py
Branch: origin/unstable
Commit: Test against local HTTP server

test_server = 'http://user:password@192.168.0.1:3128'
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
Reason: AWS API Key
Date: 2017-12-04 18:18:06
Hash: ef0f961e794f8ef443f20c796d97241a334358b9
Filepath: deploy.sh
Branch: origin/unstable
Commit: Add deploy script

AKIAAAAAAAAFAWAYABAA
~~~~~~~~~~~~~~~~~~~~~

What to do if You Accidentally Committed Secret Data to Your Git Repo?

If you have already published (e.g. pushed to GitHub) the commit containing the accidentally committed secret, then your only option is to rotate your secrets. If it was a password that you published, change the password anywhere that you were using that password. If it was an SSH private key, you should remove the corresponding public key from all authorized_keys files, etc.

If, however, you have not pushed or otherwise published the accidental commit anywhere else, you can use various tools that allow rewriting the git history. The simplest of these is git commit --amend, which you can use if the secret is only in your most recent commit. Just remove the secret data, run git commit --all --amend, and it will update the commit to match the working copy. If the secret was added to the repository further back in the history, you might need a tool like git filter-branch.

Conclusion

Every tool that makes your work easier also creates new ways to make mistakes, and Git is no exception. Accidentally committing confidential data to a Git repository will make that data available to anyone who has access to the repository, even if the confidential information is removed in a subsequent commit. TruffleHog can help you to audit your Git repositories for secrets, either as part of a periodic security audit or before making a project available to a wider audience.

Further reading

Chris Kerr