Visualising The Life of a (Code) Repository

One of the many things that I suppose most people never think about is what makes up the raw ingredients of a software application. The answer is code, of course, and code that’s distributed over hundreds, if not thousands, of files; and not just static files, but files that may get edited repeatedly. By different people. At different times.

Keeping track of all these files, including which are the current ones, and also which are the previous versions (because it’s generally considered good practice to keep copies of all the old versions of your files (and the versions of the files around them that were current at the time!) in case you need to go back to them…) can be a nightmare-ish task, which is why software projects tend to use version control systems or code repositories to manage the various files.

Recently, I’ve come across a couple of visualisation tools that show how various software projects have evolved by virtue of the check-ins to a repository over time…

Firstly, CodeSwarm:

And today, gource:

So I’m wondering…could we do similar things for:
– the life of a wiki? It has page creations, deletions and updates…
– the growth of citation tree from an academic paper (so plot the original paper, the papers that cite it, the papers that cite the papers that cite the original, and so on?)
– the submissions to an open publication repository, such as eprints, with submissions as checkins, and maybe links between nodes when one paper cites another in the repository?
– the evolution of tweets around a hashtag (nodes are tweets containing the hashtag, forks are to other hashtags mentioned in the same tweet)?
what else? (ideas in the comments, please;-)

Maybe visualisations for these exist (though I’m thinking more of animated tree based representations than things like Heavy Metal Umlaut+;-)?

Maybe there’s a common representation possible that would let us to use the tools developed e.g. for visualising code repository checkins on other sorts of tree (because the structure of the code in a repository, or on-citations from a publication) are trees, right?

PS See also: GLtrail server log visualisation.

PPS Visualising Google Analytics using gource via googalytics api and python [via @aneesha]

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

2 thoughts on “Visualising The Life of a (Code) Repository”

  1. I like the idea of visualising the life of a wiki. Seeing how people participate and a timeline/track of events could be useful for understanding how people work on collaborative materials.
    You could visualise student work on a group project and the effect of events such as tutor interventions.

    1. It seems as if someone started looking at ways of visualising mediawiki edits in codeswarm, at least for a single page:
      – video:
      – wikiswarm:

      Gource also seems to support a simple input representation [ ], which you might be able to generate from wiki log files? Let me know if you give it a try, and whether it woks – or doesn’t?!;-)

Comments are closed.