gitwalk: Bulk processing of git repos
Complex systems are usually made up of many components that span at least a few code repositories. And while this is a very good thing, it adds a few extra steps to your workflow. Having to keep several repositories up to date and on the right branches can become a little cumbersome when you need to quickly search for something or automate a minor change across the whole system. This is when gitwalk comes in.
gitwalk lets you manipulate multiple repositories at once. It’s the man-in-the-middle that abstracts away the repetitive work and lets you focus on what needs to be done. You select a group of repos using a simple expression and provide an operation to be completed for each one. This may be searching through files or the commit history, running tests and linters or even editing and pushing changes back upstream — whatever you can think of. And it integrates with GitHub.
I’ve been working on gitwalk in my spare time over the past few months and today, I’m releasing it as open source.
Where to get it?
It’s made in CoffeeScript and runs on Node.js.
You can use npm
to install it like any other Node package:
npm install -g gitwalk
Make sure to include the -g
option to get the CLI command in your system
$PATH
. But if you’d like to use the library directly, that’s possible too.
Check out the JavaScript
API.
How it works?
The gitwalk
command takes of arguments: one or more expressions and a
processor. It will evaluate the expressions into a list of repositories and
run the processor on each one. This is how it might look:
gitwalk "github:pazdera/@(tco|scriptster)" command ls -l ./lib
When invoked like this, it would take the tco
an scripster repos from my GitHub
account, clone them into a local
cache and run the ls
command on each one. Here’s the result of running that
on my system:
Gitwalk comes with a few expression resolvers and processors built-in, which are described below. However, it’s been designed to be extended with new ones that you can tailor to your needs.
If pushing to your repositories requires authentication (and it probably does), gitwalk can deal with both ssh or http auth and even access your private repositories on GitHub (if you give it your auth token). Learn how to configure all of that in the documentation.
Expressions
Expressions say which repositories will be processed. You can provide one ore more of them and gitwalk will merge the results. Additionally an expression can be marked negative to exclude previously matched entires. Check out the following examples:
# Matches all branches of the npm repo on GitHub.
gitwalk 'github:npm/npm:*' ...
# Matches all the git repositories in my home dir.
gitwalk '~/**/*' ...
# Use ^ to exclude previously matched repositories.
# Matches all my repos on GitHub _except_ of scriptster.
gitwalk 'github:pazdera/*' '^github:pazdera/scriptster' ...
# URLs work too.
gitwalk 'https://github.com/pazdera/tco.git:*' ...
# You can predefine custom groups of repositories.
# Check out the _Groups_ resolver below.
gitwalk 'group:all-js' 'group:all-ruby' ...
If you’d like to test what each of these expressions match without doing
anything, try them with the --dry-run
option:
Processors
A processor is an action that runs for each matched repository and branch and
does something. This includes things like searching, linting, testing, editing
and committing. As simple or as complex as you need it to be. Gitwalk will
checkout the working tree and point the script’s $PWD
to it.
There are a few predefined processors that you can use to perform some basic tasks on the repositories, but the true power lies in making your own (which is pretty simple to do). Here are a few examples of the default ones:
# Search for unfinished work in all JavaScript files
gitwalk ... grep '(TODO|FIXME)' '**/*.js'
# List all files in the repository
gitwalk ... command 'tree .'
# Another way to search the files
gitwalk ... command 'git grep "(TODO|FIXME)"'
# Replace the year in all Ruby files
gitwalk ... files '**/*.rb' 'sed -i s/2015/2016/g #{file}'
# Simple commit message profanity detector
gitwalk ... commits 'grep "(f.ck|sh.t|b.llocks)" <<<"#{message}"'
The grep processor lets you search the codebase with regular expressions, not unlike Unix grep. The command one lets you run an arbitrary shell command for each repository which can be a custom script. Commits and files allow you to iterate over all the commits and files in the repository respectively and run a custom command for each one.
The #{hashCurlyBraced}
templates will be expanded into values before the
command is executed. Each command exports different set of variables; check
out the docs to find out more.
JavaScript API
If you prefer doing your scripting using Node instead, you can call gitwalk directly from JS there and use exactly the same functionality from there as well:
var gitwalk = require('gitwalk');
gitwalk('github:pazdera/\*', gitwalk.proc.grep(/TODO/, '*.js'), function (err) {
if (err) {
console.log 'gitwalk failed (' + err + ')';
}
});
Summary
Making changes across several parts of a complex system can be tricky and time-consuming. Having a solid test coverage is essential to make sure everything works when you’re done and a set of good tools will help you get it done quicker.
Gitwalk is available right now on npm and GitHub, licensed under the MIT licence. I hope you find it useful!
Did you find this useful? Upvote the article on Hacker News or give it a thumbs up on Reddit.