Fragment: AutoBackup to git

Noting, as ever, that sometimes students either don’t save, or accidentally delete, all their work (I take it as read that most folk don’t back-up: I certainly don’t (“one copy, no back-ups”)) I started pondering whether we could create local git repos in student working directories in provided Docker containers with a background process running to commit changed files every 5 minutes or so.

Note that I haven’t had a chance to test that any of this works yet!

The inotify-tools package wraps the Linux inotify command with a handy CLI wrapper. (inotify itself provides handy tools for monitoring the state of the file system (for example, Monitor file system activity with inotify.) The gitwatch package uses inotify-tools to as part of “a bash script to watch a file or folder and commit changes to a git repo” (you can also run it as a service).

So I wonder: can we use gitwatch to back-up (ish) files in a student’s working directory in a provided container into a local persistent git repo mounted into the container?

# With git installed, we need to create a nominal user
git config user.email junk@example.com
git init

# Example of adding a file...
git add test.md
# And committing it...
git commit -m first commit
# But students won't remember to do that very often...

# So we need to shcedule something
# inotify-tools gives us access to a directory monitor...
apt-get update -y && apt-get install -y inotify-tools

# The gitwatch package uses inotify-tools to trigger
# automatic git add/commits for changed files
# https://github.com/gitwatch/gitwatch#bpkg
# It's most conveniently installed using bpkg package manager
curl -Lo- "https://raw.githubusercontent.com/bpkg/bpkg/master/setup.sh" | bash
bpkg install  gitwatch/gitwatch

# Note that it requires that USER is set
# The following is probably not recommended...
#export USER=root

# Run as a service
# https://github.com/gitwatch/gitwatch/wiki/gitwatch-as-a-service-on-Debian-with-supervisord

# Example of viewing state of github repo at at particular time # https://stackoverflow.com/a/56187629/454773

It strikes me that if students are working on notebooks, we really want to commit the notebooks cleared of cell outputs. One way of doing this would be tweak gitwatch to spot changed files and if they are .ipynb notebook files, actually backup a copy of them (perhaps as .ipynb files via a hidden directory, perhaps via hidden Jupytext paired markdown files) rather than the notebook; pre-commit actions might also be useful here…

Having got files backed up, -ish, into a git repo, the next issue is how students can revisit a point back in time. If the repo is mirrored in GitHub, it looks like you can revisit the state of a repo if you don’t want to go too far back in time (“simply add the date in the following format – HEAD@{2019-04-29} – in the place of the URL where the commit would usually belong”).

On the command line, it seems we can get a commit immediately prior to a particular date: git rev-list master --max-count=1 --before=2014-10-01 (can we also fdo that relative to datetime?). Then it seems that git branch NEW_BRANCH_NAME COMMIT_HASH will create a branch based around the state of the files at the time of that commit, or git checkout -b NEW_BRANCH_NAME COMMIT_HASH will create the branch and check it out (just make sure you have all your current files checked into the repo so you can revert back to them…)

Ah… this looks handy:

# To keep your current changes

You can keep your work stashed away, without commiting it, with `git stash`. You would than use `git stash pop` to get it back. Or you can `git commit` it to a separate branch.

# Checkout by date using `rev-parse`

You can checkout a commit by a specific date using rev-parse like this:

`git checkout 'master@{1979-02-26 18:30:00}'`

More details on the available options can be found in the `git-rev-parse`.

As noted in the comments this method uses the reflog to find the commit in your history. By default these entries expire after 90 days. Although the syntax for using the reflog is less verbose you can only go back 90 days.

# Checkout out by date using `rev-list`

The other option, which doesn't use the reflog, is to use `rev-list` to get the commit at a particular point in time with:

```
git checkout `git rev-list -n 1 --first-parent --before="2009-07-27 13:37" master`
```

Note the `--first-parent` if you want only your history and not versions brought in by a merge. That's what you usually want.

PS seems like @choldgraf got somewhere close to this previously… choldgraf/gitautopush: watch a local git repository for any changes and automatically push them to GitHub.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: