gitwalk: Bulk processing of git repos

2015-11-03
~
5 min read

Complex systems are usually made up of many components that span at least a few code repositories. And while this is a very good thing, it adds a few extra steps to your workflow. Having to keep several repositories up to date and on the right branches can become a little cumbersome when you need to quickly search for something or automate a minor change across the whole system. This is when gitwalk comes in.

gitwalk lets you manipulate multiple repositories at once. It’s the man-in-the-middle that abstracts away the repetitive work and lets you focus on what needs to be done. You select a group of repos using a simple expression and provide an operation to be completed for each one. This may be searching through files or the commit history, running tests and linters or even editing and pushing changes back upstream — whatever you can think of. And it integrates with GitHub.

I’ve been working on gitwalk in my spare time over the past few months and today, I’m releasing it as open source.

gitwalk: An example
Searching different repositories with gitwalk.

Where to get it?

It’s made in CoffeeScript and runs on Node.js. You can use npm to install it like any other Node package:

npm install -g gitwalk

Make sure to include the -g option to get the CLI command in your system $PATH. But if you’d like to use the library directly, that’s possible too. Check out the JavaScript API.

How it works?

The gitwalk command takes of arguments: one or more expressions and a processor. It will evaluate the expressions into a list of repositories and run the processor on each one. This is how it might look:

gitwalk "github:pazdera/@(tco|scriptster)" command ls -l ./lib

When invoked like this, it would take the tco an scripster repos from my GitHub account, clone them into a local cache and run the ls command on each one. Here’s the result of running that on my system:

gitwalk: An example
Listing files in two repositories at the same time.

Gitwalk comes with a few expression resolvers and processors built-in, which are described below. However, it’s been designed to be extended with new ones that you can tailor to your needs.

If pushing to your repositories requires authentication (and it probably does), gitwalk can deal with both ssh or http auth and even access your private repositories on GitHub (if you give it your auth token). Learn how to configure all of that in the documentation.

Expressions

Expressions say which repositories will be processed. You can provide one ore more of them and gitwalk will merge the results. Additionally an expression can be marked negative to exclude previously matched entires. Check out the following examples:

# Matches all branches of the npm repo on GitHub.
gitwalk 'github:npm/npm:*' ...

# Matches all the git repositories in my home dir.
gitwalk '~/**/*' ...

# Use ^ to exclude previously matched repositories.
# Matches all my repos on GitHub _except_ of scriptster.
gitwalk 'github:pazdera/*' '^github:pazdera/scriptster' ...

# URLs work too.
gitwalk 'https://github.com/pazdera/tco.git:*' ...

# You can predefine custom groups of repositories.
# Check out the _Groups_ resolver below.
gitwalk 'group:all-js' 'group:all-ruby' ...

If you’d like to test what each of these expressions match without doing anything, try them with the --dry-run option:

gitwalk: An example
Dry rung: Printing all matches.

Processors

A processor is an action that runs for each matched repository and branch and does something. This includes things like searching, linting, testing, editing and committing. As simple or as complex as you need it to be. Gitwalk will checkout the working tree and point the script’s $PWD to it.

There are a few predefined processors that you can use to perform some basic tasks on the repositories, but the true power lies in making your own (which is pretty simple to do). Here are a few examples of the default ones:

# Search for unfinished work in all JavaScript files
gitwalk ... grep '(TODO|FIXME)' '**/*.js'

# List all files in the repository
gitwalk ... command 'tree .'

# Another way to search the files
gitwalk ... command 'git grep "(TODO|FIXME)"'

# Replace the year in all Ruby files
gitwalk ... files '**/*.rb' 'sed -i s/2015/2016/g #{file}'

# Simple commit message profanity detector
gitwalk ... commits 'grep "(f.ck|sh.t|b.llocks)" <<<"#{message}"'

The grep processor lets you search the codebase with regular expressions, not unlike Unix grep. The command one lets you run an arbitrary shell command for each repository which can be a custom script. Commits and files allow you to iterate over all the commits and files in the repository respectively and run a custom command for each one.

The #{hashCurlyBraced} templates will be expanded into values before the command is executed. Each command exports different set of variables; check out the docs to find out more.

JavaScript API

If you prefer doing your scripting using Node instead, you can call gitwalk directly from JS there and use exactly the same functionality from there as well:

var gitwalk = require('gitwalk');

gitwalk('github:pazdera/\*', gitwalk.proc.grep(/TODO/, '*.js'), function (err) {
    if (err) {
        console.log 'gitwalk failed (' + err + ')';
    }
});

Summary

Making changes across several parts of a complex system can be tricky and time-consuming. Having a solid test coverage is essential to make sure everything works when you’re done and a set of good tools will help you get it done quicker.

Gitwalk is available right now on npm and GitHub, licensed under the MIT licence. I hope you find it useful!

Did you find this useful? Upvote the article on Hacker News or give it a thumbs up on Reddit.