Manipulating git repositories with Node.js

2015-10-27
~
6 min read

What do linting, building and testing have in common? They all work best when automated. With services like GitHub’s webhooks, it’s easy to subscribe to certain events on your repository and be notified by a HTTP request. These might be commits being pushed or pull request landing at your repo when you can trigger a build or run your tests.

Apart from a webhook, you’ll need a server that will listen at the other end to make it work. Node.js with Express are pretty good at handling requests and you can have them up an running in minutes. What’s left is processing the repository itself.

This post is about how to do that with Node.

Meet nodegit

A quick search through npm reveals no shortage of packages that provide access to the functionality of git in Javascript either by wrapping around the git binary or implementing the functionality directly.

The one that caught my eye is nodegit — native bindings to the libgit2 library made by GitHub. The package gets lots of development activity and seems to be well supported.

You can install it as any other Node.js package using npm:

npm install nodegit

The maintainers provide pre-built binaries of libgit2 for the most common architectures. However, if yours isn’t amongst them, the installer might need to build the library from source and you’ll be required to install a C toolchain to proceed.

Done installing? Let’s take a look at a few examples. There’s quite literally a million things you can do with nodegit, and while this post can’t possibly cover everything, we’ll go though a few to give you an idea of how the library works.

All the snippets below were made with nodegit 0.5.0. When in doubt, please refer to the official API docs.

Clone a repo

The first step is pretty obvious: you can hardly do anything with the repository without cloning it first. To do that, we’ll use the Clone class:

var nodegit = require('nodegit'),
    path = require('path');

var url = "https://github.com/pazdera/scriptster.git",
    local = "./scriptster",
    cloneOpts = {};

nodegit.Clone(url, local, cloneOpts).then(function (repo) {
    console.log("Cloned " + path.basename(url) + " to " + repo.workdir());
}).catch(function (err) {
    console.log(err);
});

Cloning will return an already initialised handle to the repository (an instance of the Repository class) that you can use to access its contents.

The library uses promises to manage the asynchronous calls. If you’re unsure what the weird chain of .then() and .catch() functions mean, checkout this quick introduction.

Authentication

In case your repository isn’t openly available, you may need to authenticate via http or ssh. Nodegit can do that via CloneOptions that you can pass to Clone as an argument. Like this:

var cloneOpts = {
  fetchOpts: {
    callbacks: {
      credentials: function(url, userName) {
        return nodegit.Cred.sshKeyNew(
          userName,
          '/Users/radek/.ssh/id_rsa.pub',
          '/Users/radek/.ssh/id_rsa',
          "<your-passphrase-here>");
      }
    }
  }
};

The credentials function is called when the remote requests authentication. The implementation will vary for http, ssh and other types of authentication. Check out the Cred class in the docs for more ways to authenticate.

Open it up

If you already have a local copy of the repo, you can open it with nodegit using the Reposiotory#open function:

var nodegit = require('nodegit');

nodegit.Repository.open('./scriptster').then(function(repo) {
  console.log("Using " + repo.path());
}).catch(function (err) {
  console.log(err);
});

Just as Clone, it returns an instance of Repository that you can use to manipulate it.

Read the commit history

One thing that you may need is going through the last few commits on a branch and extracting metadata from each one. This can be done through the Commit#history event emitter that iterates through the revisions, generates events for each commit along the way and returns an array of all the Commit objects at the end.

The following promise chain will retrieve the name of the current branch, walk through its history and print the hash and commit messages for the last 10 commits on it:

var nodegit = require('nodegit');
var Promise = require('promise');

nodegit.Repository.open('./scriptster').then(function(repo) {
  /* Get the current branch. */
  return repo.getCurrentBranch().then(function(ref) {
    console.log("On " + ref.shorthand() + " (" + ref.target() + ")");

    /* Get the commit that the branch points at. */
    return repo.getBranchCommit(ref.shorthand());
  }).then(function (commit) {
    /* Set up the event emitter and a promise to resolve when it finishes up. */
    var hist = commit.history(),
        p = new Promise(function(resolve, reject) {
            hist.on("end", resolve);
            hist.on("error", reject);
        });
    hist.start();
    return p;
  }).then(function (commits) {
    /* Iterate through the last 10 commits of the history. */
    for (var i = 0; i < 10; i++) {
      var sha = commits[i].sha().substr(0,7),
          msg = commits[i].message().split('\n')[0];
      console.log(sha + " " + msg);
    }
  });
}).catch(function (err) {
  console.log(err);
}).done(function () {
  console.log('Finished');
});

Here’s the output in the terminal:

nodegit: Listing commits
nodegit: Listing the last 10 commits on a branch.

Checkout a different branch

Besides manipulating metadata, nodegit has no problems working with trees. You can check out different branches, modify your files and even create new commits. For the sake of keeping the examples simple, this is how you checkout a branch:

var nodegit = require('nodegit');

nodegit.Repository.open('./scriptster').then(function(repo) {
  return repo.getCurrentBranch().then(function(ref) {
    console.log("On " + ref.shorthand() + " " + ref.target());

    console.log("Checking out master");
    var checkoutOpts = {
      checkoutStrategy: nodegit.Checkout.STRATEGY.FORCE
    };
    return repo.checkoutBranch("master", checkoutOpts);
  }).then(function () {
    return repo.getCurrentBranch().then(function(ref) {
      console.log("On " + ref.shorthand() + " " + ref.target());
    });
  });
}).catch(function (err) {
  console.log(err);
}).done(function () {
  console.log('Finished');
});

All the work here is done by the Repository#checkoutBranch function, the rest is just printing the name of the current branch before and after the checkout.

Note that you need to specify a preferred checkout strategy. I picked FORCE which will ditch any local modifications of the working tree in favour of the version from the repository.

The output of the above should look something like this:

nodegit: Checkout a different branch
nodegit: Checking out a different branch.

Search through the tree

When you clone a repository with nodegit, it will look exactly the same as if you did it with the git command and you can read and modify the working tree as you wish.

Here’s an example that will print all the files from the scriptster repo that I cloned earlier:

var rr = require("recursive-readdir");

rr("./scriptster", ["scriptster/.git/**"], function (err, files) {
  for (var i = 0; i < files.length; i++) {
    console.log(files[i]);
  }
});

For simplicity, it uses the recursive-readdir module from npm and ignores all the files within the .git/ directory. Here’s the output:

nodegit: Reading the working tree
nodegit: Iterating through the contents of the working tree.

Summary

This was nodegit, a useful library if you need to manipulate git repositories with Node.js. Under the hood, its calling GitHub’s libgit2 to do the heavy lifting. It works particularly well if you need to automate various parts of your workflow — like building, testing and deployment using git on your server.

It’s open-source, distributed under the MIT licence (libgit2 is licensed under GPLv2). Do get in touch with the maintainers if you’d like to contribute!

If you liked the article, please give it a thumbs up on Hacker News or Reddit. Thanks!